Issue: Explicit Markup to Semantically Express Poetic Forms
Problem: HTML5 lacks explicit semantic markup to express poetic forms.
Quick Links for this Document
Semantic Markup for Poetry: A Proposal | Responses & Reactions | Poetry on the Web | Detailed Discussion | Email
Semantic Markup for Poetry: A Proposal from Dr. Olaf Hoffmann
What I missed so many years in (X)HTML is some useful markup for poems.
The result we can see in the "real web life" -- a lot of meaningless tag soup around, disoriented authors lost between silence and semantically meaningless markup...
Obviously poem markup is still not available in "HTML 5". Why not? Can this be added to the "HTML 5" draft?
It is pretty nice to have something like "section", "article", "header" in "HTML 5" (why not a generic heading element as "h" from XHTML2 by the way? This would be pretty useful for poems as well as for larger projects as anthologies, books or general content fragments joined together for example with server sided scripts as PHP).
Some useful and usable markup for poetry is still missing. If someone really tries to markup a poem today, one ends up with a div-class-tag-soup-nonsense. And there are many authors out there publishing poetry only in the web, currently without having any sufficient markup elements in (X)HTML for this.
According to my observation readers of poetry and general literature have a wide range of capabilities (a lot of readers of poetry are robots from search engine for example). Therefore, it is quite useful to markup those type of literature to make elements with a semantic meaning accessible for authors in (X)HTML and to simplify the identification of poetry for readers.
I think, it is the main purpose of a "Text Markup Language" as (X)HTML to markup text in a semantic way, isn't it? Poems are text -- lets markup it now ;o)
Some useful elements (block elements):
poem
container for a poem, similar to a section, may contain header, footer, div, p (maybe useful for modern poetry), strophe, line, h
strophe
- stanza or strophe of a classical poem, may contain either line or inline elements or CDATA
line
- a line or row of a poem, may contain inline elements or CDATA
h
- a heading of a poem
I think such a construction covers already many types of poems. For non-classical as for example concrete poetry this is maybe sufficient too, still div or p can be used to realize non-conventional content.
source: Dr. Olaf Hoffmann, post to public-html, 5 October 2007
Note: A fuller, detailed discussion of poetic markup alternatives is contained in this document.
Leif Halvard Silli's Proposed Solution: Introduce a TEXT Element Parallel to VIDEO and AUDIO
Precis: The central idea is to introduce into HTML a <TEXT> element - as a parallel to VIDEO AUDIO - to be used when we want to "embed" a independent piece of text - such as a poem, a play, whatever.
Explanation: The very problems that Olaf raised can, as I have see it, be split in two interrelated issues:
A "container issue", for which I propose <TEXT> with various possible attributes for classification of the kind of text (perhaps, actually, CLASS would be good enough.)
a "low-level text element issue" - e.g. the need e.g. a <L> element for (semantic) lines.
I see those issues as interrelated because: If we want to reuse existing elements (such as <LI> for lines), then that becomes much more simple if we have a semantic container. E.g. if an author have either <POEM> or <TEXT class=poem>, then I believe that he or she would find it more easy to simply accept using e.g. <LI> for lines in a stanza/paragraph, than he/she would do if one simply had to use a typical <DIV> as container. If we don't have a semantic container, then it becomes the more necessary to have specific low-level elements.
That said, there might be good reason for adding some more low-level elements:
The legacy "semantics"/restrictions of legacy the elements is one thing. But also, there are several text genres that need low-level list-like formats, less bound up with the semantics of DL, OL and UL. (The semantics of DL, OL and UL are that each <LI> or each <DT> is kind of independent from the other list items.) The DIALOG element is an example of reuse of an existing element which is more "literature like": Suddenly a list tells a story - i.e. the items are suddenly more directly interrelated - in a story-like way.
I am not so sure that we need to add a stanza format/paragraph list format (<P><LI><LI></P>) that can only be used in poems - I think there is good chance we can come up with something that is useful in other genres.
source: Leif Halvard Silli, post to public-html
Responses & Reactions
Peter Krantz: XHTML2 and RDFa Satisfy This Request
Do I understand you correctly that you want to include markup for a specific domain (poetry) in HTML5?
XHTML2 provides an extension mechanism through RDFa. RDFa will let you add semantic meaning (and parsing by others) to your specific domain. In fact you could semantically express poems of specific forms this way and create interesting possibilities for people who want to extract the poetry for e.g. resarch reasons.
A markup language should probably include as little as possible from specific domains and focus on the general things instead. Domain specifics should be handled via an extension mechanism that allows for unambiguous interpretation of the expressed information.
source: Peter Krantz, post to public-html, 5 October 2007
Doug Schepers (5 October 2007)
I'd like to note that in addition to poetry, the same solution could be applied to song lyrics, which are very widespread content on the Web. There are many sites devoted to nothing else, and sites like MySpace (and many blogs) have a lot of lyrical content.
I personally favor the idea of loosening up the definition of <p> into just that of a block of text (since the idea of a paragraph is not universal among natural-language orthographies), and using some other semantic system to annotate specialities of written language (where you could, for example, choose between a simple poetry markup and a more complex one that notates free verse or sonnets or even structural elements of iambic pentameter). This might be RDFa, or spans marked with microformats tags. You'll be able to get much more precision than with a blunt tool like HTML.
Including lyrics in the category of poetry does make explicit a couple of interesting technological/processing aspects, though:
- guitar tabs (or other musical notation) could be integrated using ruby;
- timed text (as for karaoke) could be used to add meter and rhythm to the presentation style (think SMIL or HTML+Time).
And, of course, as you point out, giving special consideration to particular types of content (such as poetic or lyrical) aids in its categorization or aggregation.
source: Doug Schepers, post to public-html, 5 October 2007
Ian Hickson (5 October 2007)
HTML5 actually defines how to mark up poems in HTML (the word "poem" is in the spec half a dozen times, in fact!).
Specifically:
the heading of a poem is marked up using <header> and the appropriate level of <hX> elements,
the stanzas of poems written in the classical form are given by <p> elements, with line breaks indicated by <br> elements (one of the few allowed uses of <br>).
the stanzas of freeform poems are given by <pre> element.
There is an example of a part of a classical poem in the <img> element section (search for "On either side the river lie").
source: Ian Hickson, post to public-html, 5 October 2007
Gregory J. Rosmaita: Response to Ian Hickson (5 October 2007)
a stanza isn't a paragraph, nor is a verse -- if they were, they'd be called paragraphs;
line breaks carry no semantic meaning -- why not a containing element that indicates a line of poetry, much as <LI> and </LI> indicate the beginning and end of a list item?
PRE does not express any meaningful semantics, nor does it lend structure -- other than the visual illusion of structure -- to the text contained in a PRE container...
the "classical poem" example you cited:
<pre><h1>The Lady of Shalott</h1> <strong><p><img src="shalott.jpeg" alt=""></p></strong> <p>On either side the river lie<br> Long fields of barley and of rye,<br> That clothe the wold and meet the sky;<br> And through the field the road run by<br> To many-tower'd Camelot;<br> And up and down the people go,<br> Gazing where the lilies blow<br> Round an island there below,<br> The island of Shalott.</p></pre>
is used to illustrate the contentious claim that:
- Examples where the image is purely decorative despite being relevant would include things like a photo of the Black Rock City landscape in a blog post about an event at Burning Man, or an image of a painting inspired by a poem, on a page reciting that poem. The following snippet shows an example of the latter case (only the first verse is included in this snippet):
why should those processing the poem non-visually be bereft of a description of the accompanying illustration? obviously, the illustration captures an artist's conception of the "lady of shalott", which could aid an individual's understanding of the poem, and which could enhance the readers understanding of the cross-fertilization of poetry and art in a particular era and a particular style...
i fail to comprehend why an illustration such as this should be null alt texted and why it should validate without a descriptor, in particular, a long description of the painting -- not only those who cannot see may need a description of the painting, but also those with color blindness and those with an extremely restricted viewport who may need guidance through the illustration...
if the illustration isn't worthy of description, then it isn't worthy of being included in the first place -- one cannot, as the draft currently does, classify this image as "A purely decorative image that doesn't add any information but is still specific to the surrounding content", as the example you cited is NOT a purely decorative image, but an interpretation of the poem it is being used to illustrate -- therefore, it demands both a terse and a long description...
source: Gregory J. Rosmaita, post to public-html, 5 October 2007
Examples of Poetry on the Web
American Verse Project (University of Michigan's Humanities Text Initiative
British Poetry 1780-1910: A Hypertext Archive of Scholarly Editions
Detailed Discussion
What is Poetry, Stanza, Strophe etc?
Words like stanza, strophe, verse, paragraph, section, article used in several european languages have the same origin in old greek or latin, but the meaning differs slightly in current spoken languages (compare pages in wikipedia related to each other, but written in different languages), therefore a too specific naming of elements may cause some confusion and annoyances in different languages, the better approach can be to use more technical terms to describe the functionality of such elements to cover all related usage cases for any author with a careful description of the usage of such elements.
How to Markup a Stanza or Strophe?
The fine structure of typical poetry is a stanza (or strophe or verse paragraph) with lines (or verse lines), example with pseudo code:
<stanza> <line>I dreamed I was a fly</line> <line>buzzing through the sky</line> <line>looking for some sweets</line> <line>or some spoiling meats</line> </stanza>
To cover only the functionality and to avoid the impression, that this is very specific to a specific type of text, one can reduce this to more technical element names:
<ll> <l>I dreamed I was a fly</l> <l>buzzing through the sky</l> <l>looking for some sweets</l> <l>or some spoiling meats</l> </ll>
Note: ll or gl equals "list of lines" or "group of lines"; l equals "line".
This can be very useful, if it is discovered that other content may have closely related requirements of structure or presentation. A too specific naming could prevent authors with only closely related content of similar functionality to use it. This avoids annoyance similar to that, having a section about prose in the recommendation, but none for poetry ;o)
Requirements to the Functionality of the Markup
- The structure of a stanza is similar to that of a list. But additionally stanzas contain normally rhythmic content as poem fragments, lyrics fragments, song fragments. Therefore a list item like element is needed to markup this substructure.
- The lines are similar to list items. As for block elements a line element starts with a new line, stanza too starts with a new line, separated somehow from possible previous and following content with some space perpendicular to ordinary writing direction in graphical representation, respectively a break in aural presentation. For aural presentation typically the end of a line will be marked with a smaller break as the break at the end of a stanza.
- In contrast to prose lists as a shopping list a stanza is no prose and lines are intended as lines without line breaks within. This is easy to perform for aural presentation. If this conflicts in visual rendering with limited horizontal (vertical) space, author and use expect to see some intuitive sign for the undesired line break, for example an indentation after such a forced line break (or maybe a specific warning symbol about the line-break).
- In contrast to prose lists apart from the problem described in 3, lines have as default presentation no list item symbol or indentation. Numbering is not excluded, but is more related to styling of specific cases (to reference a line for interpretation, educational purposes, a scientific approach) as for the default presentation.
Approach for a (Default) Styling Model for a Stanza
aural: The reader could switch to a speaker with more advanced abilities for rhythmic and metric text type, if the parent type is not already a poetry type, else it will cause a pain for the audience. A less advanced reader may have a warning note about the problem as a minimum requirement.
visual:
ll {display:block; padding:1ex}
l {display:block;padding-left: 2ex}
l:first-letter {margin-left: -2ex}
Most other presentation and styling should be available with CSS styling and there is no more generic usual typical styling, some poems are centered, some not, most other choice of styling properties is a matter of taste, solvable with CSS. But authors may use inline elements (3.12 Phrase elements especially) inside the line elements. For simple free form artworks it is maybe required to have fix whitespace within the line, either with a pre element, with or if this is left to a poetry container. Other content directly within the stanza element is not to be expected.
The structure model above does not cover slightly more complex structure as for theatre or opera with some dialog. Such literature may include prose as well (stage directions). Currently the HTML5 draft dialog element is reserved by the heading of the section to be for prose, therefore not usable for such type of poetry. One might extent the use case to poetry, putting the dialog into a poetry container and the remaining prose parts in an element as aside or into a prose container.
Because such type of dialog poetry normally is perfomed and the written text equivalent is not the primary artwork, the specific requirements for a line may be skipped within such a poetry dialog environment, because it is less important and the behaviour of the dialog element does not depend on the parent container (simpler implementation rule). However, sometimes the speaker in the dialog switches within one verse line, this may cause further minor problems for markup (see Shakespeare sample below).
Open question: Are there deviations or other requirements for rhythmic text notation in other cultures or for other writing directions (as horizontal?)
Some Specific, More Critical Use Cases
Alliteration - one line in one stanza
<ll xml:lang="de"><l>Fischers Fritze fischt frische Fische. Frische Fische fischt Fischers Fritze.</l></ll>
Haiku
A variation about Matsuo Bashos frog haiku (~ 1686), might require vertical writing direction in original language, but appears traditionally in one line.
<ll><l>At an old calm pond - Suddenly a frog jumped in - A sound like a splash!</l></ll>
<ll xml:lang="de"><l>Uralter Weiher. Ein hineinspringender Frosch. Platschendes Wasser!</l></ll>
Blank Verse
Either the author still interpretes everything as one list of lines or each line as a separate list of one line. If ll (list of lines) is used instead of stanza, this still fits in the simple model, the author may want to use CSS to adjust the styling a little bit.
Free Verse
If the author still needs to pronounce that a line is a line, ll is still pretty good to do that. If the author only needs to pronounce that it is poetry, a poetry container with paragraphs (p) is sufficient, see below.
Simple Free Form with Requirement for Defined Empty Space
<ll> <l><pre>I dreamed I was a fly</pre></l> <l><pre> buzzing through the sky</pre></l> <l><pre> looking for some sweets</pre></l> <l><pre> or some spoiling meats</pre></l> </ll>
Possible Suggested Methods Available with HTML4/XHTML1.x for "ordinary" Poetry
Method 1.1 (br only)
<br> <br> <br>I dreamed I was a fly <br>buzzing through the sky <br>looking for some sweets <br>or some spoiling meats <br> <br>
Pro:
uses only existing br to 'markup' stanza and line this model can be expanded, using b as a generic element for headings and a for links and or img to realise empty space, for many authors there is no need for further elements, because normally the server already sends text/html even elements like html or body become redundant
- backwards compatibility back till the dawn of HTML
Con:
- currently only available in the transitional profiles, else at least of div containing all content is used as a block element
- no semantical structure and meaning for anything at all
- no list like structure, none of the requirements is met
Method 1.2 (p+br)
<p> I dreamed I was a fly<br /> buzzing through the sky<br /> looking for some sweets<br /> or some spoiling meats<br /> </p>
Pro:
- source code not blown up with much markup
- reuse of existing elements p and br
- some authors/editors have this in use in HTML4 today, ignoring the structure gaps of this construction.
Con:
- br does not markup a line as a line, this is just a forced line break. br is an empty element, therefore cannot contain the line content itself.
- a stanza has more a list structure, p has no specific structure, for historical reasons a paragraph can be interpreted as a degenerate stanza, lost the line stucture. Therefore p lacks of generic inner structure to represent a stanza.
- the requirements 2 and 3 is not met by p and br
- the current use in HTML4 can be explained, because elements with a structure more related to poetry are defined currently for other purposes, such as dl+dt+dd, therefore p+br is somehow left as the simplest method, because there is no other choice.
Method 1.3 (pre)
<pre> I dreamed I was a fly buzzing through the sky looking for some sweets or some spoiling meats </pre>
Pro:
- source code not blown up with much markup
- reuse of existing element pre
Con:
- no guarantee by markup, that the strucure is really respected by visual or oral representation.
- HTML5 pronounces, that pre is intended for 'a block of preformatted text, in which structure is represented by typographic conventions rather than by elements' - a stanza and a verse line have opposite requirements, they are specific structure elements for a poetry text type.
- no list like structure, none of the requirements is met.
Method 1.4 (dl+dd (or dt))
<dl> <dd>I dreamed I was a fly</dd> <dd>buzzing through the sky</dd> <dd>looking for some sweets</dd> <dd>or some spoiling meats</dd> </dl>
or
<dl> <dt>I dreamed I was a fly</dt> <dt>buzzing through the sky</dt> <dt>looking for some sweets</dt> <dt>or some spoiling meats</dt> </dl>
Pro:
- fits to the fact, that a stanza is similar to a list
- meets the structure of pseudo code above
- reuse of existing elements dl, dd (or dt for more traditional types of poems)
- the author may add additional markers using dt for educational purposes and scientic treatment of the content
already in use to markup poetry by advanced authors. For example several samples for poetry and lyrics are marked up with dl+dd in wikipedia: sample in the stanza article, samples in the rhyme article, some samples in the german article about 'Sonett'
- usage of dl+dt avoids already not required indentation and not required list symbols
Con:
- This use of a definition list implicates somehow that the lines define what? the stanza? This is maybe not exactly the intended use case for a definition list and authors will have problems to identify, that this is somehow related to poetry content.
- requirement 3 for visual representation not met, minor problem with 4, typically all dd are indented
- In HTML5 this usage is currently excluded, because a minimum of one dt respecitively dd is required now (authors can leave it empty of course).
Method 1.5 (ul/ol+li)
<ul> <li>I dreamed I was a fly</li> <li>buzzing through the sky</li> <li>looking for some sweets</li> <li>or some spoiling meats</li> </ul>
or
<ol> <li>I dreamed I was a fly</li> <li>buzzing through the sky</li> <li>looking for some sweets</li> <li>or some spoiling meats</li> </ol>
Pro and Con similar to those outlined for Method 1.4 except for Con 3; additionally typically ul and ol have list item symbols or numbers and indentation as representation for li. ol numbering is pretty useful for educational purposes and a scientific treatment, but else not in common use. To indicate the begin of a line with a symbol is not in common use either. In HTML5 ul, ol, li are restricted to prose content.
Method 1.6 (div+div)
<div> <div>I dreamed I was a fly</div> <div>buzzing through the sky</div> <div>looking for some sweets</div> <div>or some spoiling meats</div> </div>
Pro:
- reuses the existing element div.
- div can "markup" almost any content, even very exotic poetry constructions can be represented somehow with enough divs and additional attributes like class or role to represent the semantical meaning of the construction (RDFa approach); but in the original HTML4 only class is available, only with values without a predefined meaning.
Con:
- div is defined to have no semantical meaning at all. These are just containers for anything
- attributes as class or role or kind have to be added to give some semantical meaning and detailed functionality to everything, needs predefined attribute values to meet the requirements 1, 2, and 3
- this usage breaks the idea, that div is intended mainly for styling and for unspecific requirements as a generic grouping element without a semantic meaning.
Method 1.7 (p+span+br)
<p> <span>I dreamed I was a fly</span><br /> <span>buzzing through the sky</span><br /> <span>looking for some sweets</span><br /> <span>or some spoiling meats</span><br /> </p>
Pro:
- reuses the existing elements p, br, span
- span can "markup" almost any inline content
Con:
- span is defined to have no semantical meaning at all. These are just containers for anything
- attributes as class or role or kind have to be added to give some semantical meaning and detailed functionality to everything, needs predefined attribute values to meet the requirements 1, 2, and 3
Method 1.8 (table+tr+td)
<table>
<tr><td>I dreamed I was a fly
</td></tr><tr><td>buzzing through the sky
</td></tr><tr><td>looking for some sweets
</td></tr><tr><td>or some spoiling meats
</td></tr>
</table>
Pro:
- reuses existing table structure, can be expanded to a complete poem (complete poetry artwork) model including caption, thead, tfoot, tbody, then even simple free from poems can be marked up with specific requirements about empty space, either using more td with width in one tr or using img
- backwards compatibility back back to pre CSS time - tables were very popular for styling and presentation in the last millenium as is still for author from this time.
Con:
- neither in HTML4 nor in HTML5 table elements are intended to markup list like content or even unspecific prose or poetry, however from the logical point of view a list is a degenerate case of 'data with more than one dimension' of data with one dimension - it is not excluded to use table having only one td in a tr
- does not meet requirement 3.
- source code blown up with a lot of meaningless elements related to a list or poetry content
Resume About Existing Methods
The best approach in HTML4/XHTML1.x is either to use a definition list (currently excluded in HTML5, but in common use for many authors with advanced semantical abilities - there is a lot of poetry marked up in wikipedia with definition lists for example) or to use ul/ol. ol is already a complete solution, if numbering is required for example for interpretation, educational purposes or a scientific treatment of the content. Using list symbols to indicate lines with ul is not in common use, but the use of ul/ol ensures at least, that a forced line break within a line can be identified by the reader immediately.
An indication for specific requirements for aural/oral presentation is not available by element naming or an attribute. A screen reader is not able to indentify poetry content to ensure high quality rhythmic or metric presentation of the content apart from possible additional styling with CSS by the author.
Suggested Methods Not Available in HTML4/XHTML1.x
Method 2.1 (p+li)
<p> <li>I dreamed I was a fly</li> <li>buzzing through the sky</li> <li>looking for some sweets</li> <li>or some spoiling meats</li> </p>
Pro:
- Extends the model for p to a more structured element. This fits to the idea, that in the dawn of (written) literature a paragraph became a degenerate case of a stanza without specific line structure. This model puts the structure back to the paragraph. p without a list structure then can be identified as some sort of degenerate type of text, the author was
not able to structure.
Con:
- The reuse of p an li in an extended usage model creates a backward incompatibility with older browser - display results are not predictable, because this is an invalid structure for old browsers.
- This structure requires specific rules for the behaviour of the li element depending on the parent element - is it a list item or a poetry line? This environment dependent behaviour with more complex rules is more difficult to implement as simple rules for li and other simple rules for a line element. Often the use of the same tool for different things causes such problems.
- The structure requires more complex rules for the p element - for example if the p is the parent of li element, there should be nothing else in the p element, if there is something different inside, it is the degenerate prose case and it cannot contain li elements.
- Currently li can contain any block element including p itself, not really useful for a line of a stanza in general use.
Method 2.2 (p+br, not empty)
<p> <br>I dreamed I was a fly</br> <br>buzzing through the sky</br> <br>looking for some sweets</br> <br>or some spoiling meats</br> </p>
Pro:
- same as for Method 2.1
Con:
- Consult Cons for Method 2.1. Currently the br is an empty element
- Currently both p and br are only defined for prose, this use case has to be expanded.
- Surprisingly there is a backwards compatibility issue with Geckos (tested with 1.8) and Opera (tested with 9.23) using the XML parser - no display of any content. With a SGML parser, the content is present, sometimes there are two line breaks instead of only one.
Method 2.3 (ll+l, new elements)
using new elements for rhythmic, ordered, list-like content
<ll> <l>I dreamed I was a fly</l> <l>buzzing through the sky</l> <l>looking for some sweets</l> <l>or some spoiling meats</l> </ll>
Pro:
- clean separation from less structured prose or other structured content or content with another functionality or a possibly more complex structure as ul or ol lists.
- possible to have simple rules exactly fitting to the requirements for functionality.
- intended but not exclusively for poems or songs, this is generic for simple, rhythmic, ordered text, fitting already to most/many cases of common use cases.
Con:
- needs to introduce new elements.
- backwards incompatibility, old browsers will ignore the element and display everything as one (prose) inline content.
Discussion:
If authors need backwards compatibility, one may extent the model, allowing the use of br not just for prose but in lines too, the author may use this to get a useful appearance:
<div> <ll> <l>I dreamed I was a fly<br /></l> <l>buzzing through the sky<br /></l> <l>looking for some sweets<br /></l> <l>or some spoiling meats<br /></l> </ll> </div>
With some rules for the br inside l:
Approach 1:
a) if br (with optional whitespace) is directly followed by the end of the line, the br is collapsed with the end of the line
b) else the br is interpreted as a line break, required because there was not enough space to put everything in one line
Approach 2:
a) if br appears inside a line, this is only a suggestion from the author, where to break the line, if there is no sufficient space to put everything in one line. If there are more than one br in a l, the user agent has the optimisation choice to put everything in as less lines as possible withing the available space.
b) If everything fits in one line, the br inside l are ignored.
The advantage of Approach 1 is, that authors are able to markup experimental content. Maybe this will not be used often because the possibilities to get a big difference is not very big. The advantage of approach 2 seems to be bigger - the author gets more control about the rendering in the user agent and can help to avoid line breaks between words changing the meaning of the content completely. Of course the author can do similar things using just " " instead of " " where required.
Method 2.4 (p+l)
<p> <l>I dreamed I was a fly</l> <l>buzzing through the sky</l> <l>looking for some sweets</l> <l>or some spoiling meats</l> </p>
Pro:
- Extends the model for p to more structure. This fits to the idea, that in the dawn of (written) literature a paragraph became a degenerate case of a stanza without specific line structure. This model puts the structure back to the paragraph.
p elements without l identify the content as some degenerate type of text without further structure given by the author.
Con:
- The structure requires more complex rules for the p element - for example if the p is the parent of l element, there should be nothing else in the p element, if there is something different inside, it is the degenerate prose case and it cannot contain l elements
- backwards incompatibility, old browsers will ignore the element l and display everything as one (prose) inline content - can be solved similar as discussed for method 2.3
- more difficult to identify the p as a stanza/list of lines for robots as with a specific element for this purpose. p becomes a hybrid of a list like element and a paragraph without specific substructure
Method 2.5 (ll+li)
<ll> <li>I dreamed I was a fly</li> <li>buzzing through the sky</li> <li>looking for some sweets</li> <li>or some spoiling meats</li> </ll>
Pro:
- reuses the li for the list item aspect.
- the outer new ll element ensures the immediate requirement of specific poetry functionality for the lines represented by the li.
- If better backwards compatibility is required, authors can add an additional div without a semantical meaning around the ll to have a block element for older browsers (but this does not solve the problem of a li outside ul/ol for old browsers).
Con:
- needs to introduce a new element.
- backwards incompatibility; old browsers will ignore the ll element and display it as inline element. Not completely predictable, what happens with li inside an unknown element.
Method 2.6 (section+p)
<section> <p>I dreamed I was a fly</p> <p>buzzing through the sky</p> <p>looking for some sweets</p> <p>or some spoiling meats</p> </section>
Pro:
- A stanza is a section of a poetry artwork, this fits and is applicable, maybe then the complete poem/song is an article? But article is related to the prose domain by naming. A better naming of 'article' can avoid this impression.
- Easy to add additional information as stage directions to a section using for example aside to distinguish them from the p/line structure
- Simpler to use this model to include the pre element for free form poetry as in a line element intended for inline elements
Con:
- A line within a stanza is not really related to a complete paragraph, often the line does not even contain a complete sentence, commonly the microstructure of a paragraph.
- p does not meet the requirements for a line of a stanza.
- The structure is not specific for poetry, there is no technical difference to prose, no indication for aural presentation for example, requires always a poetry container for such an identification
Method 2.7 (section+section)
<section> <section>I dreamed I was a fly</section> <section>buzzing through the sky</section> <section>looking for some sweets</section> <section>or some spoiling meats</section> </section>
Pro:
- A stanza is a section of a poetry artwork, this fits and is applicable, maybe then the complete poem/song is an article? But article is related to the prose domain by naming. A better naming of 'article' can avoid this impression.
- Easy to add additional information as stage directions to a section using for example aside to distinguish them from the p/line structure
- Simpler to use this model to include the pre element for free form poetry as in a line element intended for inline elements
Con:
- A line within a stanza is not really related to a complete section, often the line does not even contain a complete sentence, commonly the microstructure of a paragraph in a section.
- section does not meet the requirements for a line of a stanza.
- The structure is not specific for poetry, there is no technical difference to prose, no indication for aural presentation for example, requires always a poetry container for such an identification
Method 2.8 (dl with attribute kind)
<dl kind="strophe"> <dt>I dreamed I was a fly</dt> <dt>buzzing through the sky</dt> <dt>looking for some sweets</dt> <dt>or some spoiling meats</dt> </dl>
or
<dl kind="strophe"> <dd><pre>I dreamed I was a fly</pre></dd> <dd><pre> buzzing through the sky</pre></dd> <dd><pre> looking for some sweets</pre></dd> <dd><pre> or some spoiling meats</pre></dd> </dl>
Pro:
- reuse of existing elements, already in common use by advanced authors to markup poetry
- refines and extends the functionality and presentation of a definition list to a common use case of such elements. HTML5 anyway tries to redefine definition lists as description lists or dialog, this idea can be improved to provide even more extended functionality and semantical meaning to lists with less good support in the current HTML, avoiding 'list domain specific markup' as currently present in HTML4 and the HTML5 working draft
- good semantical and technical backwards compatibility for old browsers
Con:
- requires one new attribute for dl and some redefinitions of the semantical meaning of a dl list
Resume for all methods
Method 2.8 (dl with a new attribute kind) has the best backwards compatibility and the biggest flexibility and has the potential to extend the functionality of definitions lists to several other applications and already used content in the internet for lists, today not really good specified in HTML4.
Methods 2.3 and 2.5 allow to use specific elements for a specific list like functionality, as other elements like ul, ol, dl, dialog, menu do with the disadvantage of minor backwards incompatibilities, authors have to care about, if backwards compatibility is required, this can be ensured using additional already existing elements like br (currently only for prose available), div and span.
It is a matter of taste to provide another element for each specific use case from the 'list domain' or to combine all these very similar list like functionalities in one element with an additional attribute to clean up HTML a little bit using only a new attribute with different values to get the same effect with less elements, see next section.
Currently, because for most elements the usage for poetry is explicitely excluded by the content model, the best approach is anyway to introduce completely new elements, even if this creates some problems for older user agents. For some container elements however it should be possible to use them both for prose and poetry, else it might get very difficult for authors to combine poetry and prose in one document.
How to Extend the Functionality and the Semantics of a Definition List for Different Use Cases Including Poetry
In HTML4 dl was defined as a definition list, in the current HTML5 draft it is redefined as a 'description list'.
In this approach it is redefined again as 'diverse lists' with advanced functionality and semantical meaning. This is accomplished with an addtitional attribute kind. List elements as dl, ul, ol, menu, dir, dialog have a very similar structure and a similar functionality, today this can be covered with one element having an attribute kind, defining the specific use case for the already existing use cases and some more, currently not available in (X)HTML. This method mainly sanctifies common use cases, which can be already found for dl in the existing internet, as a markup for poetry, law text, conversation, dialog, menu without the need of new elements and without the danger of backwards incompatibilities in older viewers.
This approach leaves the responsibility of a useful utilisation of dt and dd to the authors, following the old ideas of Kant and others: Enlightenment, "Dare to know". However the new attribute kind suggests interpretation of degenerate use cases to avoid confusion for the reader.
Technical Semantics
dl - diverse list(s), (manifold, miscellaneous lists)
Block-level element, and structured inline-level element.
Contexts in which this element may be used: Where block-level elements are expected.
Where structured inline-level elements are allowed. Content model: Zero or more elements dt or dd
Element-specific attributes: start (see the element ol, replaced with this one) and kind (details see below)
dt - diverse list topic
Contexts in which this element may be used: inside dl
Content model: Strictly inline-level content
Element-specific attributes: value (see the element li, replaced with this one)
dd - diverse list data
Contexts in which this element may be used: inside dl
Content model: Zero or more block-level elements, or inline-level content (but not both).
Element-specific attributes: value (see the element li, replaced with this one)
value is only used for the kinds 'ordered' and 'bol', 'poetry' and 'dialog' else it is ignored, the same for start of dl.
Correlation to other elements
The combination dl/dt/dd with the attribute kind replaces ul, ol, li, menu, dir, dialog.
Functionality and use cases, values of kind
kind has predefined values, indicating the functionality and the semantics in detail.
Possible kind values and typical usage:
ordered - like the old ol, the dd is then interpreted as the old li; dt is an additional possibility to note lables, presented without numbering and indentation.
unordered - like the old ul, the dd is then interpreted as the old li; dt is an additional possibility to note lables, presented without list symbol and indentation.
def - like dl in HTML4, dt interpreted as definition term, dd interpreted as description of the previous dt, if there is no previous dt at all in the dl, the dd describes the dl itself. If there is no dd at all in the dl but a dt, the dt describes the purpose of the dl itself. Other use cases are combinations of question (dt) and answer (dd) for example in a FAQ or a school book lesson, or the combination of tasks (dt) and activities (dd), or task topic (dt) and subtasks (dd). If kind is not specified, def is assumed for historical reasons (and backwards compatibility).
strophe - the dl has the semantical meaning of a stanza, strophe, verse paragraph, for poetry, content with specific rhythmic, metric behaviour or any artwork the author needs to call somehow poetry or lyric, for example poems, songs etc, dt and dd represent the (verse) lines of dl, they can be mixed and combined as required by the author. Using only dt is mainly related to conventional poems and songs and the common use of a stanza. dd might be useful to include artwork with specific requirements for example using the pre element for preserved whitespace within a line.
For song texts, the dd may contain additional information about the melody or in a compound document data from another XML format to represent the music/melody in a written form.
aural/oral presentation requires advanced rhythmic and metric capabilities.
visual presentation: dt and dl have no specific symbol for numbering or indentation, only if the content is broken into two or more lines, the second and the following lines are indented to indicate, that everything belongs to one (verse) line. Only if a dl has a start attribute (with any value), the dt and dd are additionally indented and the dt numbered as for 'ordered', but authors may use the value attribute for dt to overwrite the automatic numbering, authors may note value="" to suppress numbering for specific lines, typically applications only number each fifth or tenth list item, not all. If dl has no start attribute, the value attribute is ignored. This fits to a common use of numbering to reference poetry lines for educational purposes, interpretation and scientific treatment.
compact - some list like texts or text with numbering of text fragments do not require a separation of list items with new lines. This happens for example for some religious texts, which may have a similar requirement to reference specific text fragments labelled with numbers or other markers, but text fragments and markers are somehow a list degenerated to a paragraph (the dl behaves as a p element, dt and dd as inline, the text is expected to be inside dd, an optional marker or number in dt). Automatic numbering is available if the start attribute is provided and can be modified with the value attribute for the dd element. Alternatively authors may use the dt attribute to provide an inline marker. This may happen for example in compressed presentation of poetry content too, if such texts are cited, authors sometimes only use a marker like '/' to preserve the original line structure. This kind is related to the use of the compact attribute for lists in HTML4.
HTML2 sample: Bible text using a compact(!) dl list
conversation - a prose dialog or interview, use case similar as described for the dialog element in the current HTML 5 working draft, replaced here with this type of dl, avoiding domain specific elements. Additionally, if the first dd has no preceding dt, the dd is marked in a different way, for example either with a list symbol or presented with a font-style like italic to indicate it as an annotation related to the conversation at the current point. Authors are encouraged to use additionally the aside element inside such a single dd to indicate such an annotation. If two dd only follow on each other, this only indicates a new line, something like a paragraph, a closed fragment of content separated from the previous line/content from the same speaker. Two or more dd at the beginning of the dl without a dt are interpreted as separated annotations. Single dt elements not followed by a dd are interpreted in such a way, that the person is at this moment speechless.
dialog - the poetry equivalent of 'conversation' with the same usage for example for a theatre or opera play, but oral/aural presentation requires advanced rhythmic and metric capabilities. If two dd follow on each other, these are simply two verse lines within the dialog. For two or more dd at the beginning of a dl see 'conversation' respectively. If the first dd has no preceding dt, the dd is marked in a different way, for example either with a list symbol or presented with a font-style like italic to indicate it as a stage direction related to the dialog at the current point. Authors are encouraged to use additionally the aside element inside such a single dd to indicate such a stage direction. Usage of the attributes start and value as described for 'strophe'.
marker - Some lists have a the requirement for a specific hard coded numbering or symbol choice. For example in a law text. The dl represents the equivalent of a paragraph or article of a law, the dt/dd define the substructure. dt contains the 'numbering' or symbol required for the text, dd contains the law text itself. The content of all dt elements is used to determine the indentation for the dd using the largest dt content of all to define the indentation for all dd. The first line of the dd begins besides the dt as for an old ol list the li content is besides the related number. If dt is missing, it is assumed, that no numbering is required, if dd is missing, nothing is assumed as the content of dd for the related preceding dt. Other use cases are a bill, receipt, invoice, recipe, shopping list and other related things. Normally dt contains a number with an optional unit only to define a quantity of the entity, noted in dt. For a bill, receipt, invoice the dt contains the prize with a currency as unit. For a recipe this contains the quantity of the entity used for the recipe, for a shopping list again the quantity of the products to be acquired. An missing dt has the same meaning as an entity, the interpretation is obvious: for a shopping list and a recipe a quantity of 1 and for a bill, receipt, invoice the same as 'free of charge', both without a specific notation.
Sample for law text using dl: law texts, german government
Discussion: An alternative approach could be to allow any CDATA as value of the value attribute. Then dt could be left more as a topic or label element and the marker/numbering is left to the value attribute.
To outline the difference to a simple table, available to markup this purpose (as any list application):
The list use pronounces the strong correlation between the dt and the dd and only a close relation to other list items, for example in a recipe the units in the dt can be quite different for each list item, but always strongly correlated to the entity mentioned in the dd. If such a construction has a dt but no related dd, the interpretation is the same as for an empty dd - empty, nothing, to be added. For more complex use cases authors are encouraged to use the table elements to markup multidimensional correlated data.
menu - for menues similar as the old menu element or as that described in HTML5, to be replaced with this dl. dd contains the menue items and optional submenues, dt can contain a label. dd is not marked with a list symbol or a number, dt could have a default styling with a font-weight of bolder to indicate it as a label.
link - this appears mainly in the head element of a document and is then not a direct part of the displayed body. It is used to create toolbars or panels in advanced viewers or is added as clearly separated additional content after the body in simpler browsers (the logical point about this is, that normally the reader first wants to read the content before a menu is used to switch to other content) or on demand for navigation in aural/oral presentation. This can be used for the complete navigation within a larger project, to group together bookmarks/hotlists for toolbar display and direct export to the browser on demand by the user. dt is used for a lable of a toolbar or a pull down menue or a sub menue. dd is used to create the menue item with normally one link element per dd, using the title attribute for description of the list item purpose. A dd may contain more than one link, if the links have different values for hreflang or type or rel with alternate, all indicating alternative versions. Such a 'multi-link' dd creates a specific submenu for alternative access. Authors are encouraged to use this only to list alternative approaches for the same or similar content for accessibility reasons. Typical examples for 'multi-link' dd are versions in different languages, multimedia targets of different formats, alternate stylesheets (not noted using XML stylesheet processing instructions).
Other content than link is not allowed in dd, but gracefully ignored, when such a navigation toolbar is created.
For backwards compatibility authors may add the link kind too within the body. It is expected, that such a construction contains only an optional label with dt one dd with a reference (a element) to a conventional index page of the project, representing the content of the not accessible menue navigation in the head. Viewers able to generate toolbars from a 'link' kind in the head will not display content of a 'link' kind in the body. For others this list is presented with noticeable border or outline, indicating it as a warning, that this is just a replacement for the intended navigation.
Authors are discouraged to use other kinds of dl in the head, no other kind is to be displayed as toolbar.
Discussion: Maybe it is useful to put the dt label in a link element too, just to avoid confusion in the head element. Because link may contain href or not, it is possible to use it too as a reference with a href and as label only without.
none - no requirement for a specific structure, this is mainly for content with other requirements or semantical meaning, not covered by the predefined values for kind. Having this avoids the abuse of defined cases. dt and dd are presented as block elements without further specific structure, authors have to achieve this using for example CSS.
dir - behaviour as intended for the old list element dir. dt are labels for directory lists in dd. the dt/dd groups are presented next to each other.
bol - backwards compatibility mode for old ol. The dd are used as described for the 'ordered' case, the dt is not displayed at all. The effect is, that authors can add numbering for old browsers manually, if this is required.
bul - backwards compatibility mode for old ul. The dd are used as described for the 'unordered' case, the dt is not displayed at all. The effect is, that authors can add symbols like things (*, #, etc) or img elements manually for the display in old browsers, if this is required.
Discussion: Maybe the list is not yet exhaustive, more values? If authors have other use cases not yet defined, they may use 'none' or with additional attributes like 'role'. If more important already existing use cases are known, it would be useful of course to specify them here.
How to Markup Larger Structures of Poetry Containing Mainly Stanzas as Fine Structure?
Epic literature -- for example, Homer's Odyssey or Goethe's Faust -- are examples for more complex text structures of poetry containing mainly stanzas and no prose. Anyway the macro structure from prose can be in most cases reused, for example section, header footer markup macro structure. If the complete document is poetry, the author often will add a descriptive sub-title like "tragedy" or "epic" and may add meta data such as RDF or Dublin Core (DC) to define this for search engines.
The main problem occurs, if a piece of poetry appears in a prose context or vice versa. This is somehow an "alien" scenario -- even if prose can be interpreted as some degenerate less structured derivation form poetry in the dawn of literature, these are quite different types of text now. A poetry container joins together for example diverse stanzas to one poem, one artwork and separates the poem from other content around it in the document like prose interpretation or discussion of the poem, navigation, advertisement, other artworks. Typically artworks appear on a pedestal, such a container sich such kind of pedestral to separate the artwork from the surrounding content.
Another problem occurs with blank verse or free verse or poetry, containing even less obviously rhythmic content. An author may leave this to paragraphs, but insisting that it is poetry, the author might want to use a poetry container, avoiding misunderstanding for interpretation (even if the common user might not look on the source code this can be quite effective for interpretation or a general scientific approach). The poetry container still covers poetry types not covered by the simple stanza/line fine structure elements and can conserve already some amount of the intentions of the author.
HTML5 introduces already some elements specific for the text domain prose but none for poetry, therefore to identify this easier and to group together stanzas for example to a complete poem or a song, a container element would be helpful. Typically this construction will contain a heading and maybe some information about the author too (in HTML5 the construct of header and footer can be reused too).
BLOCKQUOTE is not usable, if it is not a quote or not intended as quote. (The use of BLOCKQUOTE for formatting/presentational purposes was formally deprecated in HTML 4.01)
For prose in HTML5 already an article element is available to separate independent prose text from surrounding other prose or surrounding poetry.
The opposite direction is harder to accomplish.
Possible are elements like object or div, but both give no indication what the content is.
A solution could be to use instead of "article" the more generic "prose" element and for poetry the "poetry" element as containers. Another approach would be to use a generic text element for all of them with an attribute like "kind" with the possible values "prose" and "poetry" -- both types can have many subtypes, to get more details it is useful but hard to offer them all as predefined values, for prose for example: article, report, letter, short-story, fiction, novel.
For poetry some examples are: poem, lyric, lyrics, song, epic, tragedy, drama. Some of the subtypes can belong either to poetry or to prose, depending on the content.
In former times tragedy and drama was more related to poetry, today this moves more to prose in many cases. Maybe no need to be too specific, to which type a subtype belongs. This is maybe the point one can benefit from some RDF scheme(s) covering definitions of lists of subtypes of prose and poetry.
What is the functionality of such a container like either text or (article, prose) and poetry? It groups together some sub structure with close relation to each other and separates this group as an independent piece of text from the surrounding text. Reasons for this separation are: a completely different content structure, another author, only a weak relation to surrounding content. Today typical HTML documents became quite complex and for human readers or robots it gets quite difficult to make the difference, what is only jammed together and what really belongs together.
The possible content of such a container is the same as described for "section" or the current "article". There is no specific exclusion for poetry, because artists tend to get creative about the question, what poetry can be, if there is some unnecessary restriction ;o)
Typically/often such separated text containers have their own heading, not related to the heading cascade outside of the container. Either a specific unnumbered new heading element can be used for such container to indicate, that it is independent from the text fragement outside, or the usual cascade h1-h6 is used. In almost any case then the top heading in such a container has to be a h1 heading representing the heading of the complete container. This happens too, if the container itself is only a child of a section or area belonging to a heading of another rank, because the rank outside is not related to the rank of headings inside the text container.
Approach for Text Container Default Presentation/Styling
For aural presentation a useful default presentation has a bigger benefit as for visual presentation. The reader could switch to a speaker with more advanced abilities for rhythmic and metric text types, else it will cause a pain for the audience. A less advanced reader may create a warning note about the problem as a minimum requirement. For visual presentation the functionality of a text container can be pronounced (to join together text fragments to one piece of art/literature and to separate this part from other fragments around it), this can be realised for example with a margin and padding of about one or two em for the text element and a thin border or outline around or at the beginning and the end.
Currently there is no really common behaviour to separate different content from each other, but it is easy for authors to change the behaviour with CSS again, if they need no separation (why do they use containers then?) or more advanced separation. For text only viewers the separation may be less detailed or simpler as for viewers with advanced rendering and styling capabilities.
CSS sample for a suggested default visual presentation/styling within an advanced viewer:
text {display:block;margin:1ex;padding:1ex; border: none}
text[kind] {border-style: double; border-width: medium} /* to indicate unspecified/unknown values */
text[kind='prose'] {border-style: dotted; border-width: thin}
text[kind='poetry'] {border-style: dashed; border-width: thin}
text[kind='poetry'] text[kind='poetry'] {border-style: none; border-width: thin}
text[kind='prose'] text[kind='prose'] {border-style: none; border-width: thin}
text[kind='poetry'] text[kind='prose'] {border-style: outset; border-width: thin}
text[kind='prose'] text[kind='poetry'] {border-style: inset; border-width: thin}
If not something like a kind attribute is used, but to elements as prose and poetry, the properties or selectors become simpler, of course.
Complete Samples (Pseudo Code)
Example 1:
<text kind="poetry" role="poem:freeForm textTune:fun">
<style type="text/css"><![CDATA[ @import url("poem.css"); @import url("poemAural.css"), aural; ]]></style>
<header>
<h1>Dream to fly</h1>
<aside role="text:dedication">to my buzzing spring love</aside>
</header>
<dl kind="strophe">
<dt>I dreamed I was a fly</dt>
<dt>buzzing through the sky</dt>
<dt>looking for some sweets</dt>
<dt>or some spoiling meats</dt>
</dt>
<dl kind="strophe">
<dt>I waked up in a cold sweat</dt>
<dt>last reminisence was a swat</dt>
</dl>
<footer>
<address>Olaf, 2007-02-08, Hannover</address>
</footer>
</text>
Example 2. Using prose in poetry container as poetry:
<text kind="poetry" role="poem:freeForm poem:experimental textTune:fun">
<style type="text/css">
<![CDATA[
@import url("poem.css");
@import url("proseInPoem.css");
@import url("poemAural.css"), au