Smushing or aggregating RDF

This term is often used to name the process of aggregating resources based on inverse functional properties. If two resources have the same inverse functional property, they are owl:sameAs and their other properties can be intermixed - hence smushed.

* More on smushing in the rdfweb foaf wiki

Smushing Implementations

Smushing Algorithm

A typical smushing algorithm would be (described in Leo Sauermanns Blog)

I am putting together more about smushing, which will be a key factor in the global semantic web: to connect annotations that were made by different people.

A typical smushing algorithm would be:

The problem is, when you have a set of triples TxIy that have several subjects that should be the same - as defined by IFP - to choose which subject is the "canonical" subject and should now be filled with the triples.

There are different approaches to find the canonical resource:

Another question is what to do with the smushing. Different approaches

  1. store the smushing in an extra graph
  2. delete the old triples, add the smushing
  3. add the smushing additional to the old triples (tricky)

Each has obvious advantages and disadvantages. For gnowsis I would prefer (1)to smush into an extra graph, which is similiar to (3) but seperates the data.

In gnowsis we have the problem of incremental smushing, which means that we crawl thousands of emails per day and then would like to smush the persons in the addresses, but only of the new messages.

~~~

I can see why you'd want to replace bnodes with a URI if possible, and if all the subjects are bnodes then they'll smush down to a single bnode. But beyond that the need for a 'canonical' subject sounds like it's app-specific.

Ok if:

where A,B,C,X,Y,Z are named resources, following the approach suggested, with A as the preferred resource, you'll infer -

which is probably useful if A is a person and you want to pull out a vCard card representation from all their attributes (but then if you're using FOAF they won't have a URI anyhow...). But what is there to suggest you won't have a use for:

I'm not sure, but without good justification it seems premature to ignore that last statement.

-- DannyAyers

~~~

Once you're working on a smushed model, you need to canonicalize your subjects. I.e. you should be looking for statements of the form

-- AlexStewart

~~~

RdfSmushing (last edited 2008-04-07 10:03:30 by MortenFrederiksen)