SPARQL/Extensions/Aggregates

From W3C Wiki

SQL contains aggregate functions to select and return aggregate functions of multiple result values after grouping query solutions in a certain way. The SPARQL specification contains no machinery for dealing with aggregates, though there are ways to query for some universal aggregates like MIN or MAX.

Several SPARQL implementations support aggregates:

  • OpenLink Virtuoso supports COUNT, COUNT DISTINCT, MAX, MIN and AVG in queries and subqueries. Virtuoso does not implement an explicit GROUP BY clause, instead implicitly grouping solution results by all variables appearing in aggregate functions in a query projection. Virtuoso does not implement a HAVING clause, but the functionality can be emulated via subqueries. Virtuoso allows aggregate functions to be used as arguments to other projected expressions and allows the arguments to aggregate functions to be arbitrary expressions.
  • ARQ supports COUNT and COUNT DISTINCT. ARQ implements a GROUP BY clause that can act on either variables or expressions. Expressions in a GROUP BY can be named and then selected from the query, providing a way of selecting arbitrary expressions. If GROUP BY is omitted, then ARQ groups on all variables in the query pattern. ARQ implements a HAVING clause that can filter the result set after grouping.
  • ARC supports COUNT, MAX, MIN, AVG, and SUM. ARC requires that aggregate functions in a query's projection be named with the AS keyword. ARC implements a GROUP BY clause that must be present if anything other a single aggregate is selected. ARC only allows variables (not expressions) in aggregate functions or GROUP BY conditions.
  • Glitter, part of Open Anzo, supports COUNT and COUNT DISTINCT. Glitter implements a GROUP BY clause that can only contain variables.

A paper on RAP's SPARQL DB engine discusses aggregates. ?? Does RAP implement aggregates?

Design Questions

What happens when aggregate functions are applied to results with unbound values or mixed data types?

does anyone have an answer?

Fundamentals

SQL is a very old language, and the meaning of all but the simplest aggregation statements in SQL is opaque because of the notation, and is also highly implementation-dependent.

      Chimezie -- If you are interested, I can supply an example SQL query on which 
                  Oracle and MySql return different results, both of which are
                  intuitively wrong to most people.  The different results are 
                  concrete evidence of a shortcoming.  The general problem is that
                  there is no model theory or other implementation independent standard
                  that specifies what the results of any query should be.  -- Adrian

This raises the question -- Why stick with 1970s style SQL-like syntax for SPARQL aggregation?

adriandwalker-at-gmail-dot-com suggests that, instead of the 1970s-style SQL aggregation notation, it would benefit SPARQL to use a rule-based notation similar to the examples in

          www.reengineeringllc.com/demo_agents/Aggregation.agent