SIOC/IdeasAndThoughts

From W3C Wiki

SIOC Ideas and Thoughts

This is a page for ideas, thoughts and discussion about SIOC.

Goal

What is our main goal?

1. To interconnect communities on the Internet.
2. To facilitate finding related information, by searching in one site, the SIOC ontology and interface will allow one to locate related and relevant information in other, SIOC "friendly" sites (see problem P1 below).

Scope

Scope of SIOC includes:

1. An ontology and system architecture for facilitating finding related information on interconnected community sites.
2. Community sites including forums, blogs and other discussion primitives.

Scope does not currently include:

1. Dumping out or exporting _all_ of a community site's configuration information.
2. Describing user permission information.

All the information in the ontology should be relevant to the main goal, identifying related pieces of information.

Existing Related Work

* Threaded Description Language (TDL)
* Google
* RSS
* Trackbacks
* ["SKOS"]
* Subject Maps
* Dialog Maps

Solving Problems in Current Technologies

...

Problems / Questions

P1

If we assume that many sites (including thousands of blog sites) have adopted SIOC, how will that help us to find the relevant information? This boils down to: how will it scale?

If we want thousands of sites to adopt SIOC and use it, how will that look in the real world? Connecting a couple of sites is not a problem: you can query them and sort the information. But with each new site added, the load increases exponentially. This does not scale well at all.

With each site added we have two problems arising:

1. More sites to query: takes more time and more resources.  Furthermore, it works the other way round as well: if you are being queried by (n-1) sites at once, you might go down.
2. More potential relevant results: information overload.  How do you determine the most relevant results and how do you do that in real time?

We need to look at a SIOC enabled system from the overall SIOC architecture point of view: how will the whole interconnected system of communities work. (Currently we are concentrating on how will a single system interact with others).

Issues of scaling Semantic Web P2P systems are discussed in the paper about RDFGrowth.

Solutions

S1 - Potential Solutions for Problem P1

S1-1 - Limit the Number of Sites to Query

We can limit the number of sites to query (the five best sites), but then we will discard many more blogs/forums, one of which might be the ultimate site containing very valuable information on the topic. (Googling would be better than relying on SIOC in this case.)

This can work, but we have to limit the scope of community sites that SIOC promises to interconnect, and (Uldis says) we are just deferring the scaling problem: we have not solved it.

S1-2 - Change the Moment When to Query

1. Query at article creation time
2. Re-query at regular intervals
3. Query dynamically (most load on all systems)

Challenges

C1 - Sites in Different Languages

Interconnecting Sites in Different Languages

This is more interesting than connecting English sites, because information in other languages may have some details which has not passed through the "language barrier". But how can it be done?

* Force community sites to use one language (and then interconnect them).
 * Will work, but the site probably will not be popular!
* Connect using a common categorisation.
 * Should work to reflect the related posts, but what use will that be to the reader?

Still, a user (seeking information) should be able to set his/her settings to display or hide related links in other languages.

In fact, meta information (language, relevance strength index) should be kept along with the related link, to allow users to tailor the related information he/she wants to see by language and by the relevance / kind of the link.