Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-224

Translate OFFSET/LIMIT into native "slice".



    • Type: New Feature
    • Status: Done
    • Resolution: Done
    • Affects Version/s: TERMS_REFACTOR_BRANCH
    • Fix Version/s: None
    • Component/s: Bigdata SAIL


      Bigdata natively supports such "slice" operations on its rules. However, because there are often higher level filters imposed by the Sesame framework, we do not translate those features from SPARQL at this time for Sesame. This is an area where we know we need to optimize further by capturing more of the SPARQL query semantics in the native rule execution layer. When "slice" queries are executed natively, the result is quite good. When the "slice" is imposed by Sesame, the rule execution layer tends to overgenerate results until the result iterator is closed, at which point the rule execution is interrupted. The upshot of this is that we do more work than is necessary when the "slice" is imposed by Sesame, so the query latency is higher.

      The amount of "extra" effort can be reduced by changing the query "chunk size". For example, if the default chunk size is 1000 then 1000 solutions will be generated before the first chunk of solutions is materialized and the Sesame "slice" filter can decide that it is satisfied and close the result iterator, thereby interrupting the query. With a smaller chunk size, the query will do less work before the result set iterator is interrupted. However, smaller chunk sizes are less effective for high-volume queries so this parameter really should be adjusted for the specific query rather than globally for the triple store instance.

      Slice semantics impose additional constraints on scale-out if you want the results to be stable from query to query so you can page through the result sets. Stable evaluation requires that we disable most sources of concurrency so the results will be produced in a fixed order and you can get the next "page" of results by setting OFFSET(t) := LIMIT(t-1)+OFFSET(t-1). If the desire is only to have the "TOP 10", then stable evaluation is not required and the query can be evaluated with better parallelism.

      This is the sort of thing where having query hints in SPARQL would be great. Something along the lines of embedding property/value pairs and boolean flags in comments for a SPARQL query would be all that is required.




            mikepersonick mikepersonick
            bryanthompson bryanthompson
            0 Vote for this issue
            0 Start watching this issue