You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Requested enhancement: Aggregate over paths in the graph by preserving search order.
In SPARQL 1.1, GROUP_CONCAT aggregates do not preserve any order of elements. In particular, the result of an embedded subquery that provides elements in the correct order will be reordered when the outer query applies a GROUP_CONCAT.
A related requirement is to preserve search order. While this could be solved once order can be reliably asserted to aggregates (e.g., by using yet another COUNT aggregate to approximate the length of the search path, and then ordering over the counts), this will be very inefficient in comparison to just preserve search order. Also, if multiple paths exist, it will be incorrect if different search paths exist.
From the user perspective, maintaining search order would be fully backward-compatible with the current behavior. From the implementation perspective, certain optimizations may not be applicable anymore. If that is the case, I suggest to introduce a new aggregate ORDERED_GROUP_CONCAT that otherwise behaves like GROUP_CONCAT.
Example 1: Assume you have an RDF description of states and state transitions (workflows, finite state automata, ...), with every state associated with a particular value. For a given initial state, return all sequence of al values associated with the subsequent states in their order of occurrence.
(See here for a [partially successful, but implementation-specific] attempt to model that in SPARQL 1.1.)
Example 2: Search and aggregate over linguistic annotations, e.g., those provided by the NLP Interchange Format.
It is often necessary to return the concatenated string value for a span of words. An approximate solution (that works in many [but not all] cases in Apache Jena) is to iterate over the span using a property path that starts with the first word:
SELECT ?w ?myspan
WHERE {
{ SELECT ?w (GROUP_CONCAT(?word; separator=" ") AS ?myspan)
WHERE {
?w a nif:Word.
?first conll:HEAD* ?w.
MINUS { [conll:HEAD* ?w] nif:nextWord+ ?first }
?first nif:nextWord* [ conll:HEAD* ?w; conll:WORD ?word ]
} GROUP BY ?w
}
# some stuff in the outer query
}
This is not guaranteed to work by the SPARQL 1.1 spec, it does not work 100% in Apache Jena, and it is very slow.
Suggestion:
In the inner query, return a concatenated list of strings and a concatenated list of integers (say, from conll:ID)
provide a custom function that takes the string concatenation and the int concatenation as arguments and returns a modified string ordered for the integers (with duplicates removed)
Example:
SELECT ?w ?orderedSpan
WHERE {
{ SELECT ?w (GROUP_CONCAT(?word; separator=" ") AS ?myspan) (GROUP_CONCAT(?id; separator=" ") AS ?mykeys)
WHERE {
?w a nif:Word.
[conll:HEAD* ?w; conll:WORD ?word; conll:ID ?id ]
} GROUP BY ?w
}
BIND(conll_fn:get-ordered-span(?myspan, ?mykeys) AS ?orderedSpan)
# some stuff in the outer query
}
The text was updated successfully, but these errors were encountered:
Requested enhancement: Aggregate over paths in the graph by preserving search order.
In SPARQL 1.1,
GROUP_CONCAT
aggregates do not preserve any order of elements. In particular, the result of an embedded subquery that provides elements in the correct order will be reordered when the outer query applies aGROUP_CONCAT
.For asserting an order, see w3c/sparql-dev#9.
A related requirement is to preserve search order. While this could be solved once order can be reliably asserted to aggregates (e.g., by using yet another
COUNT
aggregate to approximate the length of the search path, and then ordering over the counts), this will be very inefficient in comparison to just preserve search order. Also, if multiple paths exist, it will be incorrect if different search paths exist.From the user perspective, maintaining search order would be fully backward-compatible with the current behavior. From the implementation perspective, certain optimizations may not be applicable anymore. If that is the case, I suggest to introduce a new aggregate
ORDERED_GROUP_CONCAT
that otherwise behaves likeGROUP_CONCAT
.Example 1: Assume you have an RDF description of states and state transitions (workflows, finite state automata, ...), with every state associated with a particular value. For a given initial state, return all sequence of al values associated with the subsequent states in their order of occurrence.
(See here for a [partially successful, but implementation-specific] attempt to model that in SPARQL 1.1.)
Example 2: Search and aggregate over linguistic annotations, e.g., those provided by the NLP Interchange Format.
It is often necessary to return the concatenated string value for a span of words. An approximate solution (that works in many [but not all] cases in Apache Jena) is to iterate over the span using a property path that starts with the first word:
This is not guaranteed to work by the SPARQL 1.1 spec, it does not work 100% in Apache Jena, and it is very slow.
Suggestion:
Example:
The text was updated successfully, but these errors were encountered: