Ontology Dowsing

From W3C Wiki
"Dowsing is a type of divination employed in attempts to locate ground water,
buried metals or ores, gemstones, oil, gravesites, and many other objects and
materials, as well as so-called currents of earth radiation, without the use of 
scientific apparatus."

--Wikipedia article on Dowsing, retrieved on 14th January 2010.


At the moment, the methods used in practice to locate an adequate vocabulary for describing one's data in RDF are more akin to dowsing than to an educated, technically-guided choice, supported by scientific tools and methodologies. While the situation is improving with the progress of Semantic Web search engines and better education, oftentimes data publishers still rely on informal criteria such as word-of-mouth, reputation or follow-your-nose strategies.

This page tries to identify methods, tools, applications, websites or communities that can help Linked Data publishers to discover or build the right vocabulary they need. The tools identified below are sorted from the ones that require less time and efforts from the publisher's side to those that require hard work.


Lists of ontologies and services

  Main article: Lists of ontologies

There are several webpages that reference ontologies by simply matching a theme (e.g., People, Product) to a URI or listing tools to find ontologies. Examples:

This category requires minimal effort: if the publisher's data are in the domains referenced in the list, the corresponding ontology can readily used.

These lists pose the question "how to define what's in these lists?" Popularity is one aspects, quality may be another. What is a quality ontology? When does it become popular? Who decides?

Search engines

  Main article: Search engines

Semantic Web search engines are applications for finding ontologies that require reasonable effort: queries are usually written as natural language keywords and results are ranked. Some additional information is often provided. Examples:

  • FalconS has both term search and ontology search features;
  • Sindice generic Semantic Web document search;
  • Swoogle is the grand-father of Semantic Web search engines;
  • SWSE is an RDF entity search engine;
  • vocab.cc RDF term search;
  • Watson is an ontology search engine.

The problem here is that it is still hard to choose between two matching ontologies. What should guide publishers to the right choice? Should these ontologies be reused at all? See also BuildOrBuyTerms.

Repositories

  Main article: Ontology repositories

Ontology repositories are usually more specific that semantic web search engines and their navigation/search interfaces can vary greatly. They offer tools that may be specific to the type of applications the repository was designed for. Examples;

  • Linked Open Vocabularies is a living collaborative data base of vocabularies, with rich metadata, interlinking, and version history. "All you need is LOV!"
  • Prefix.cc is a namespace lookup service, which can be seen as a kind of vocabulary directory;
  • vocab.cc is an RDF vocabulary search and lookup
  • DERI Vocabularies is a repository and can be used as an online ontology editor;
  • Knoodl was a repository and collaborative ontology management tool;
  • Ontology Design Patterns repository for design patterns and ontology modules following the patterns
  • OWL Seek was a repository of ontologies with additional metadata such as funding organisation, submitter, submission dates and possibly to get a list sorted by various criteria.

Mailing lists/online communities

  Main articles: Ontology-related mailing lists and Ontology-related online communities

If other tools are not sufficient to find an appropriate vocabulary, publishers can (and often do) rely on online communities by asking them directly. Examples:

This is a rather effortless solution which can be really efficient in some case. However, repeated enquiries about vocabularies can easily polute the traffic and publishers should first try to find a solution on their own, e.g., by following the links and indications and this wiki page. See also MailingLists.

Ontology Editors

  Main article: Ontology editors

If a data publishers cannot find a relevant vocabulary, or existing vocabularies are not good enough/suitable for the use case, they can make their own ontology. They can be helped by editors, such as:

  • Protégé ontology editor (popular, pluggable).
  • WebProtégé is the online version of Protégé.
  • NeOn Toolkit is another ontology editor with many plugins available. It is especially suited for heavy-weight projects (e.g., multi-modular ontologies, multi-lingual, ontology integration, etc);
  • SWOOP is a small and simple ontology editor;
  • Neologism is an online vocabulary editor and publishing platform;
  • TopBraid Composer is a multipurpose Semantic Web editor;
  • Vitro is an Integrated Ontology Editor and Semantic Web Application;
  • Knoodl is a community-oriented ontology and knowledge base editor.
  • Ontofly is a web-based ontology editor.
  • Altova OWL editor is another ontology editor.
  • PoolParty is a thesaurus management system and a SKOS editor.
  • IBM Integrated Ontology Development Toolkit is an ontology toolkit for storage, manipulation, query, and inference of ontologies and corresponding instances, based on Eclipse.
  • Anzo for Excel will generate an initial ontology based on spreadsheet data and structure.
  • Euler GUI is an editor for N3, RDF, OWL and other various other things.
  • OWLGrEd is a graphical ontology editor for OWL.
  • Fluent Editor is a tool for editing, manipulating and querying complex ontologies written in OWL, RDF or SWRL.
  • Intelligent Topic Manager is a Web-based and ontology-driven application to collaboratively manage and maintain data models, multilingual vocabularies and rules. Helps centralize all terminology resources and expose them to business applications.

This requires considerable efforts and requires some guidelines.

Learning ontology design, best practices and evaluation

Guides to ontology design:

Here are best practices:

In addition to finding or making an ontology that contains the terms that are needed for the dataset, publishers may like to assess the quality of the ontologies, especially when they have the choice between several of them. Some possible factors:

  • Fully documented;
  • Used by independent data pubslihers;
  • There exist tools that support the vocabulary specifically;
  • The ontology is highly ranked by users in a voting system;
  • all terms are dereferencable;
  • The ontology just covers the right domain (not an upper level "ontology of everything");
  • expressive enough: the ontology has axioms that make valuable inferences;
  • not too expressive: the ontology does not define axioms that have limited utility and would make reasoning costly;

Tools:

  • OWL 2 Validator determines whether an ontology is in OWL 2 DL, OWL 2 EL, OWL 2 QL, OWL 2 RL or OWL 2 Full;
  • OWL 1 Validator determines whether an ontology is in OWL 1 DL, OWL 1 Lite or OWL 1 Full;
  • RDF Validator the official W3C validator for RDF/XML syntax validation;
  • rdf:alerts is a tool for finding potential problems in linked data;
  • ...

Interlinking ontologies

Interlinking ontologies can mean reusing existing ontologies, in modular way, or aligning ontologies. There are quite a lot of tools for ontology matching, most of which are prototypes from research projects. The following are well maintained tools:

Some useful information are available on http://ontologymatching.org/, including references to existing tools, a very comprehensive list of scientific publications on the topic (500+ references).

Related Events, Projects, etc.

There is an important amount of research work going on to solve parts of the problem of guiding publishers to the right vocabulary:

  • EKAW 2010 Workshop on Ontology Quality;
  • ISWC 2010 Workshop on Semantic Repositories for Web, SERES 2010;
  • SEALS is a European project on evaluating semantic applications, including ontologies;
  • Semantic Web Journal is an academic journal which encourages the publication of ontology description (=> peer reviewed ontologies => good ontologies, in principle).