What do we mean when we say "regular paths"? A regular path is a path, in a graph, were the ordered list of edges along that path satisfy a given path expression. The path expression grammar we use is similar to common regular expression pattern languages, thus the term "regular paths". An example of a path expression, in our syntax, is [part]* indicating paths with zero or more part edges.
As many have mentioned before me, it is a bit surprising that SparQL is the W3C recommended RDF query language, yet lacks support for graph path traversal patterns. RDF is, after all, a graph data model. Path operations, such as transitive closure, seem as though they should be part of the minimal requirement of a graph query language. However, this is not the case with SparQL.
The work presented here is a minimally invasive way of adding regular path support within ARQ, the SparQL query processor used by the Jena RDF framework.
Ideally we would like to augment SparQL query processing, adding support for regular path processing, without deviating from the SparQL specification. Strictly speaking, we have not achieved this. While the SparQL specification allows for some user-defined functions, it appears to allow them only as filter functions. ARQ, on the other hand, supports an additional extensible function type, property functions. GLEEN is a Java library (jar file) which adds support for processing of some regular path expressions. GLEEN is implemented as ARQ property functions. As such, while it deviates to some degree from the SparQL standard, it diverges only where ARQ diverges.
Note, some examples on this page use the prefix "fma", a shorthand for the Foundational Model of Anatomy. This is used for illustrative purposes, but the GLEEN library is not restricted to a particular ontology.
In ARQ property functions, the property URI uniquely identifies a custom triple matching function rather than an RDF property. These function are allowed to bind variables programatically (unlike filter functions). The subject or object arguments of a property function triple may be lists as well as atomic values.
GLEEN's Closure function expects a list in the object position where the first argument of the object list must be the URI of an ontology property. The second argument may be a URI or a variable. Either this argument or the subject must be bound. An example triple clause containing the Closure function is as follows ("gleen" is a prefix for the functions "java:" URI <java:edu.washington.sig.gleen.>):
fma:Heart gleen:Closure ( fma:part ?heart_part ) .
The above example would bind to the variable ?heart_part all of the resources that stand in the fma:part closure (Kleene star) of the resource fma:Heart. The pattern
?whole gleen:Closure ( fma:part fma:Right_atrium ) .
would bind, to the variable ?whole, all resources that have fma:Right_atrium in their fma:part closure (such as fma:Heart). Other configurations are allowed, such as the substitution of a bound variable in the position of "fma:Heart" or "fma:Right_atrium".
OnPath supports path expressions that are more complicated than simple transitive closures. It supports operators similar to those found in regular expression grammars, like '?' (zero or one), '*' (zero or more), '+' (one or more), '|' (alternation), and '/' (concatenation).
Unlike the Closure function, the first argument in the object list cannot be a property URI. Now it is instead a string containing a path expression. Square brackets are used both as delimiters and as grouping operators. An example is the following:
fma:Heart gleen:OnPath ( "[fma:part]*/[fma:contains]" ?heart_containment ) .
The above example binds to the variable ?heart_containment, all resources that can be found on paths from fma:Heart by traversing zero or more fma:part properties followed by a single fma:contains property (in the FMA fma:Blood_in_right_atrium, for example, is connected to the fma:Heart because it is contained in one of its parts, fma:Right_atrium).
A more complex example, below, binds to the variable ?bar all resources identified as equivalent to example:foo (where classes in an equivalency list may themselves have equivalents, and so on).
example:foo gleen:OnPath ( "[[owl:equivalentClass]/[owl:intersectionOf]/[rdf:rest]*/[rdf:first]]+" ?bar ) .
For a more complete specification of the OnPath path expression grammar, see PathExpression grammar spec.
All that is required to make the GLEEN library accessible by a functioning Jena/ARQ SparQL processor is to place the GLEEN jar (and its supporting libraries) on the classpath. Then, any query that will call the GLEEN property functions should include a prefix such as:
PREFIX gleen:<java:edu.washington.sig.gleen.>
Note, this prefix is not strictly required, but using it makes queries much more legible. Note also that GLEEN requires Java version 1.5.
See Known Issues below for a code addition required when using GLEEN with the current release version of ARQ. This workaround will not be required with the next ARQ release.
If you study the PathExpression grammar closely you will see that it limits the amount of nesting of path operations allowed (basically they can be nested 2 deep). Further experiment will be required, exploring the boundary between expressivity and tractability. If users require more expressive expressions, and if they are not computationally prohibitive, then the grammar may be extended in the future. Additionally, while the present functions allow users to discover the resources reachable by a given path pattern, they do not allow users to see the exact path traversed. So, given a path expression like
fma:Heart gleen:OnPath ( "[fma:part]*/[fma:contains]" ?heart_containment ) .
and one ?heart_containment = fma:Blood_in_right_atrium, the OnPath function does not reveal how many part links, nor to what parts, the OnPath function traversed to get this result. We are considering writing a function similar to OnPath, that actually returns the entire subgraph traversed to form the result set.
As mentioned in the limitations section, the current path functions only work for discovering resource nodes connected to other resource nodes by a particular path pattern, they cannot presently be used to discover nodes connected to literal values. It is possible that we may wish to extend GLEEN in the future to allow for this sort of pattern.
Also, of course, users will guide future work through their input. This work is still in its prototype stage and feedback is encouraged.
This library was developed by Todd Detwiler from the University of Washington Structural Informatics Group. Please direct questions or comments to det@u.washington.edu
GLEEN is implemented as an extension library for ARQ, the SparQL query processor within the Jena RDF framework. The ARQ/Jena framework was heavily leveraged in this work. The path expression parser inside of GLEEN was auto-generated using the ANTLR Parser Generator. This project was supported by NIH grant "Realizing the potential of reference ontologies for the semantic web" (1R01HL087706-01).