Languages as RDF Resources

From W3C Wiki

Languages as RDF resources

Objectives

Make the languages as implicitely defined by tags and subtags in BCP 47 (RFC 4646 & RFC 4647) [1] and ISO 693 [2] available as proper RDF resources.

Use cases

  • Create a global common set of URIs representing languages for the Semantic Web.
  • Enabling semantic declaration and queries based on values of subtags.
  • Track changes of and relations between languages for semantic reasoning.
  • Enrich languages additional information.
  • Map other language coding systems to BCP 47.
  • Formal definition of Dublin Core language range, currently under discussion [3].

Background

For backgorund discussions see the threads [4], [5] on public-esw-thes list. Language codes are defined in BCP 47 [1] which bases on ISO 639 [2]. Codes are managed by the ISO 639-2 Registration Authority [6], ISO 639-3 Registration Authority [7] and in the IANA Language Subtag Registry [8]. There are and/or have been some other systems of langauge tags (see [9]) that will not directly be covered by this proposal. See [10] for more information how to use BCP 47.

Issues identified so far

Management

Which authority is to define, maintain and host URIs? Since IANA/BCP 47 is specifying the tag grammar and subtag registry, it seems the best candidate. Base tag URIs may also be specified by the ISO 639 Registration Authorities.

Technical

BCP 47 defines a grammar to construct langtags from subtags. The list of subtag types and subtag values is authoritative, but the possible combinations defining langtags are open, and not all combinations make sense. So it seems difficult to define and maintain URIs for langtags, whereas URIs for subtags are easy to define. On the other hand combinations may be created dynamically to supply URIs for all possible combinations.

Proposed approaches

Detailed ontology with subclasses

  • Define subtags as SKOS concepts
  • Define each type of subtag (language, script, region, variant, ...) as a specific concept subclass.
  • Define a class "Language" of which instances are not specified.
  • Use anonymous instances of Language to gather subtag properties.

The proposal [11] shows how this approach can be used to define Dublin Core metadata.

Plain SKOS without subclasses

  • Define subtags as SKOS concepts (each type of subtag (language, script, region, variant, ...) is a simple SKOS concept)
  • Define languages as SKOS concepts, connected with SKOS broader/narrower to subtags they are build from
  • Depending on the status of support for coordination, define languages as coordination of subtags they are build from
  • Define SKOS mappings for changes

See IANA Language Subtag Registry in SKOS for a first script to that converts IANA Language Subtag Registry to SKOS in RDF/XML based on this approach.

Bottom-up approach

http://www.lingvoj.org is harvesting URIs already used on the Web to declare languages as RDF resources, in order to make them available in the framework of the Linking Open Data project : http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData.

http://www.lexvo.org/ is a related project that defines URIs for languages and language families based on ISO 639-3 and ISO 639-5, and provides a variety of information about the languages as RDF data.

Open Issues

  • Languages change and there are deprecated codes.

References

  • [1] BCP 47
 http://www.ietf.org/rfc/bcp/bcp47.txt
  • [2] International Organization for Standardization, "ISO 639: Codes for the representation of names of languages"
  • * ISO 639-1 Part 1: Alpha-2 code, 2002
  • * ISO 639-1 Part 2: Alpha-3 code, 1998
  • [3] DC property domains and ranges
 http://au.dublincore.org/usageboardwiki/PropertyDomainsAndRanges.html
  • [4] Could ISO-639 languages be defined as skos concepts?
 http://lists.w3.org/Archives/Public/public-esw-thes/2006Dec/0017.html
  • [5] languages and scripts
 http://lists.w3.org/Archives/Public/public-esw-thes/2007Feb/0009.html
  • [6] Library of Congress, ISO 639-2 Registration Authority
 http://www.loc.gov/standards/iso639-2/
  • [7] SIL International, ISO 639-3 Registration Authority
 http://www.sil.org/iso639-3/
  • [8] The IANA Language Subtag Registry
 http://www.iana.org/assignments/language-subtag-registry
  • [9] Library of Congress, "MARC Code list for languages", 2003
 http://www.loc.gov/marc/languages/
  • [10] Language tags in HTML and XML (W3C i18n)
 http://www.w3.org/International/articles/language-tags/Overview.en.php
  • [11] Sample proposal RDF file
 http://perso.orange.fr/universimmedia/lang/bcp47_sample.rdf