ITS WG Collaborative editing page

Follow the conventions for editing this page.

Status: Initial Draft ie. please focus on technical content, rather than wordsmithing at this stage.

Author: Yves Savourel

Handling of White Spaces

Summary

It must be possible to specify for a given element content how white spaces are to be handled (i.e. whether they are to preserve or collapsible).

Challenges

[[YS-- Here is a new try for a description of the issues. ]]

Knowing whether the white spaces in a given element (especially the line-breaks) are collapsible or not is important for proper segmentation and matching when using computer assisted translation tools.

There are three main types of wrapped text:

1. Text formatted for no special reasons:

<para>This is the first 
sentence of the paragraph. It's followed
by a second sentence.</para>

2. Text where line-breaks can be segment-breaks:

<data name="CMD_USAGE"> 
 <value>Usage: po2xliff input[ options[ output]]
Where options are:
   -trg : create target entries
   -fill: fill the target entries with the source text</value>
</data>

3. Text intentionally pre-formatted for display constraints without regard for the linguistic aspects:

<print witdh="75">Copyright (C) 2005 Okapi Framework Developers 

This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.

This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser
General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this library; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA</print>

Notes

There are case where the white space handling can be overriden at the style sheet level only, bypassing information withing the XML document itself: CSS allows the property 'white-space' (See http://www.w3.org/TR/REC-CSS2/text.html#white-space-prop).

[[CL I am not sure if we need to say sth. about whitespace-related changes (NEL) in XML 1.1 ]]

[[CL Should we possibly go for a general requirement (stated in the guidelines) along the lines of "canonicalize your XML" (see http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Example-WhitespaceInContent).]]

Quick Guidelines

The xml:space="preserve" attribute may provide a solution for some of these requirements at the document instance level.

[[YS-- Not sure if it is important to be noted, but xml:space defines only "preserve" and "default", "default" not being necessarily "do-not-preserve". Do we have situations where "do-not-preserve" would be needed? ]]

The whiteSpace constraint defined in the XML Schema Part 2: Datatypes Second Edition may provide a solution for these requirements at the schema level.

its0505ReqWhiteSpaces (last edited 2005-09-24 10:18:19 by GoutamSaha)