Talk:HTML XML Use Case 04

From W3C Wiki

Data Island Considerations

As I see it, the semantics of existing HTML5 implementations aside, I could see five potential alternatives that could represent data islands within HTML. These aren't proposals, only what I see as minimal information necessary to capture non-HTML content.

Option 1: Subclassing <script>

In this option, you would have a construct of the type

<script norun="true|false" type="mime-type" src="anyUri" id="xs:ID" document="true|false">script body</script>

This case interprets the @norun attribute to mean that when this attribute is true (or present as an unconstrained attribute in HTML) then the content contained within the script element is to be treated as data to be independently parsed via script at some later point, to not be displayed, and to be out of scope of any text search mechanism within the page.

To be consistent, @type should be considered as extensible, with the default value being text/javascript. In the case where @norun is true, this element should be ignored by the parser completely and not interpreted. I this case the role of @type is purely advisory in autre.This would imply that the contruct

<script norun="true" type="text/xml">
  <foo>This is content of text
</script></nowiki>

is in fact a valid expression, even lacking the closing </foo> tag. In this case, internal the <foo> element would be treated as escaped HTML:

&lt;foo&gt;This is content of text

When @norun is true, then this would imply that the user agent should attempt to interpret the semantics of the contents of the <script> element, based either upon instrinsic capabilities (i.e., the language is natively handled) or by dynamic capabilities (a previous script block or binding is in place that interprets this tag).

In this interpretation,

<script norun="false" type="text/xml" id="bar" document="true">
    <foo>This is a node</foo>
</script>

would interpret the content as xml content, would parse this content, if possible, and would then bind this content to the script element itself, possibly via an extension attribute - e.g., document.getElementById("bar").extension.document would retrieve the XML root node as a DOM (if the parse fails, then this would be an error as would be the case elsewhere). The @document attribute in this case would indicate that such a document contains meaningful data of some sort, rather than being interpreted by a script engine. This could be seen more clearly with json:

<script norun="false" type="text/json" id="bar" document="true">
    {'foo':'This is a node'}
</script>

In this case the foo node could be accessed as

document.getElementById("bar").extension.document.foo

In which case extension.document would contain a json class document interpreted as Javascript. Such cases might hold true for YAML or other data formats as well.

In the case where a user agent supports this particular type, the the conversion would be handled at script interpretation time. In the case where it doesn't, the node will need to be flagged and then some helper interface (perhaps via an onscriptfail or similar event handler) would be called to interpret the script. If this fails (returns false), then an exception is thrown.

This mechanism could be used to embed XML content that is interpreted in some other fashion, such as SVG:

<script norun="false" type="application/svg+xml" id="image" document="true">
   <svg xmlns="http://www.w3.org/2001/svg" width="250px" height="250px" viewBox="0 0 250 250">
       <defs>
           <radialGradient id="red-blue">
                <stop stop-color="red" offset="0"/>
                <stop stop-color="blue" offset="1"/>
           </radialGradient>
       </defs>
       <circle r="50" cx="100" cy="100" fill="url(#red-blue)"/>
   </svg>
</script>

Note in this case that @document is still true (the DOM is accessible via document.getElementById("bar").extension.document), but with @norun set to false, the script interpreter will look for the application/svg+xml" mime-type in order to interpret it. It also means that any identifiers that are introduced would be available in the larger HTML document, albeit holding foreign objects then HTML element nodes.

The case for XQuery as a script illustrates where @document would be false:

<script norun="false" type="application/xquery" id="xq-script" document="false">
  (: Main script, invoked upon document load :)  
  declare function local:main(){
        (b:addEventListener(b:dom()//input[@id='bat'],"onclick","local:helloWorld")
    };
  (: Displays a "Hello World!" alert message :)  
  declare function local:helloWorld($evt,$loc){
        (b:alert("Hello World!"))
    };
</script>

Here, the construct isn't a document structure, but it does contain executable code. In this case, the user agent would try to interpret it, would fail, would launch an "onScriptFail" event which would then be picked up by a Javascript handler that would attempt to interpret it based upon the mime-type, then would either execute it within that context or would indicate (by raising an onScriptInterpretError") that it can't in fact handle the mime-type. One advantage of this approach is that you could actually have several "onScriptFail" handlers registered, so that different implementers could hook in with their own solutions. If script interpreter #1 failed, then perhaps script interpreter #2 would still succeed.

BTW, in the case of both @norun and @document, the HTML mechanism of non-bound attribute are honored here. That is to say:

<script norun="false"></script>

is the same as:

<script></script>

while

<script norun="true"></script>

is the same as:

<script norun></script>

Option 2: <document> Element

In this option, you would have a construct of the type

<document type="mime-type" src="anyUri" id="xs:ID" >document body</document>

This would be structurally the same as:

<script norun="false" type="mime-type" src="anyUri" id="xs:ID" document="true">script body</script>

with the idea being that you would have the same non-interpretative document instantiation that you would in the latter scenario. This will similarly produce an .extension.document element that would be either a DOM or a JSON node (it may be worth differentiating extension.document and extension.json and supporting both). Any other language structure such as YAML or CSV would be mapped to a JSON structure.

This is actually a good case for SVG or MathML:

<document type="application/svg+xml" id="image">
  <svg xmlns="http://www.w3.org/2001/svg" width="250px" height="250px" viewBox="0 0 250 250">
      <defs>
          <radialGradient id="red-blue">
               <stop stop-color="red" offset="0"/>
               <stop stop-color="blue" offset="1"/>
          </radialGradient>
      </defs>
      <circle r="50" cx="100" cy="100" fill="url(#red-blue)"/>
  </svg>
</document>

Again, this would support an ondocumentfail error that would be thrown in the case that there isn't an internal representation handler to take care of this particular mime-type, and this might then be interpreted via a JavaScript (or perhaps plugin) shim.

Option 3: <content> or <escape> Element

In this option, you would have a construct of the type

<content type="mime-type" src="anyUri" id="xs:ID" >content body</content>

or

<escape type="mime-type" src="anyUri" id="xs:ID" >content body</escape>

This would be structurally the same as:

<script norun="true" type="mime-type" src="anyUri" id="xs:ID" document="false">script body</script>

This case creates a pure "null" area - the content is not interpreted in any way and it does not appear in the narrative stream or the CSS model. It exists for scripters to pull content out of the node and then parse it according to their particular needs, using the @type attribute value simply as an advisory tag. If you wanted to use XML with it, for instance, you'd have to do the following:

<content type="application/svg+xml" id="image">
  <svg xmlns="http://www.w3.org/2001/svg" width="250px" height="250px" viewBox="0 0 250 250">
      <defs>
          <radialGradient id="red-blue">
               <stop stop-color="red" offset="0"/>
               <stop stop-color="blue" offset="1"/>
          </radialGradient>
      </defs>
      <circle r="50" cx="100" cy="100" fill="url(#red-blue)"/>
  </svg>
</content>
<script type="text/javascript">
   var image = document.getElementById("image");
   var dom = (new DOMParser()).parseFromString(image.textContent)
   // do something with the dom
</script>

The <content> block would make the process of storing data possible without necessarily requiring a specialized processor to do something with this data. In the above example, the id "red-blue" would not be interpreted into the identifier space for the primary HTML document.

White space would be retained within a <content> block. This makes it possible to store content such as CSV:

<content type="text/csv" id="csv">"color_name","color_value"
"Red","#FF0000"
"Green","#00FF00"
"Blue","#0000FF"</content>
<script type="text/javascript">
    window.addEventListener(document.getElementsByName("//BODY"),"onload",function(evt){
        var text= document.getElementById("csv").textContent;
        var lines = text.split("\n\r");
        var header = lines[0];
        var body = lines.slice(1);
        var prop_names_tail = header.split('^"|","|"$').slice(1);
        var prop_names = prop_names_tail.slice(0,prop_names_tail.length() - 2);
        var list = {};
        for (var prop_name in prop_names){
           list[prop_name]=[];
           }
        for (var line in body){
            var line_values_tail = line.split('^"|","|"$').slice(1);
            var line_values = line_values_tail.slice(0,line_values_tail.length() - 2);
            for (index=0;index!=prop_names.length;index++){
                var prop_name= prop_names[index];
                list[prop_name].push(line_values[index]);
                }
            }
        // do something with the list     
</script>

Option 4: <xml> Element

The <xml> element is a special case of the <script> element specific to XML instances only, and would have the form:

<xml type="anyXMLMimeType" src="anyUri" id="xs:ID">
    xml body
</xml>

This is equivalent to

<script norun="false" type="anyXMLMimeType" src="anyUri" id="xs:ID" document="true">xml body</script>

In this particular case, the XML is instantiated directly, and is also interpreted based upon the mime-type, so this too could be used for instantiating SVG, MathML, MusicML, etc.

Note that Option 4 should be seen itself as being split into two distinct sub-options.

In Option 4a, the XML is parsed with an XML processor, retaining namespace information, singleton elements and strict well-formedness.

In Option 4b, the XML is parsed via the HTML processor, with namespaces converted to attributes, singleton elements being split into open and close tags with no internal content, and reduced well-formedness. It is in essence syntactically HTML. (This holds true for options #1 and #2 as well, but is most obvious with #4).

Option 5: xml Attribute

The @xml attribute is an attribute that holds escaped XML content. It would be of the form:

<someHTMLElement xml="<data><foo>My data</foo></data>">someHTMLElementContent</someHTMLElement>

This holds content in a specific attribute in the same way as <content> holds it in a specific element - there is no specific interpretation bound to it, and it has no other effect upon the element beyond being attached to it. This is perhaps the easiest solution on the HTML side, but the least useful on the XML side, and it ends up creating attributes that are illegible and possibly quite long. However, this isn't that much different than the @value attribute on <input> elements.