Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1379

Join reordering strictly according to W3C semantics

    Details

    • Type: Improvement
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_RELEASE_1_5_2
    • Component/s: None
    • Labels:
      None

      Description

      The join reordering in Blazegraph 1.5.1 in some cases differed from the W3C standard. This has been toroughly refactored, resolving in the resolution of various issues regarding proper join ordering. In particular, this affects the time when BIND clauses are evaluated, (ii) fixes regarding the reordering of triple patterns in the context of interleaving OPTIONAL or MINUS nodes, as well as (iii) proper ordering of FILTER (NOT) EXISTS clauses. Please find below a list of tickets that were directly or undirectly associated to this refactoring:

      • BLZG-48 Query fails to project subquery variables
      • BLZG-50 Queries with multiple VALUES clauses
      • BLZG-876 BIND not executed before SERVICE call
      • BLZG-1021 optimizer = None and FILTER EXISTS
      • BLZG-1256 Service call with values clauses create a cross product
      • BLZG-1299 duplicates in VALUES get replicated
      • BLZG-1281 FILTER FILTER != not working
      • BLZG-1284 optional / filter ! bound interaction malfunction
      • BLZG-1296 named subquery and VALUES expression
      • BLZG-1315 ASTJoinOrderOptimizerByType refactoring
      • BLZG-1358 SERVICE node placement issues

        Issue Links

          Activity

          Hide
          michaelschmidt michaelschmidt added a comment -

          The following should be added to the release note as an important notice:

          • Join ordering in join groups has now been aligned with the official W3C semantics (see ticket BLZG-1379 for details). Prior versions of Blazegraph did not always strictly follow the sequential execution order within join groups implied by the official SPARQL standard. In the new version, we've fixed these issues — note that, as a consequence, the results for certain complex queries may differ from the results delivered by previous versions (now returning standards-compliant results).
          Show
          michaelschmidt michaelschmidt added a comment - The following should be added to the release note as an important notice: Join ordering in join groups has now been aligned with the official W3C semantics (see ticket BLZG-1379 for details). Prior versions of Blazegraph did not always strictly follow the sequential execution order within join groups implied by the official SPARQL standard. In the new version, we've fixed these issues — note that, as a consequence, the results for certain complex queries may differ from the results delivered by previous versions (now returning standards-compliant results).
          Hide
          michaelschmidt michaelschmidt added a comment -

          We’ve recently been running into some “regressions” when switching from 1.5.1 to 1.5.2RC1 on our test systems, caused by the semantics changes mentioned in https://jira.blazegraph.com/browse/BLZG-1379 — in all cases where we encountered problems, the queries gave different results than in 1.5.1, caused by the fact that these queries were simply “wrong” accor4ding to the official semantics (and the problems could be solved by just reordering at triple pattern level).

          To quickly summarize again what’s happened: the join reordering that we implemented now strictly follows the official SPARQL semantics, whereas before there were some implicit (and wrong) assumptions that lead to inconsistent behaviour. The main (yet not only) “assumption” that changed is that it is possible to freely re-arrange statement patterns across the border of OPTIONALs. More precisely, the old Blazegraph approach was to always move all simple statement patterns in front of all OPTIONALs (which is invalid in general). The new approach does this only if it is valid, i.e. if holds that sequential execution gives us the same result (which is often, though not always, the case). To illustrate the problem by example, here’s a query we recently ran into problems with:

          SELECT ?subject ?image ?label WHERE {
          ?subject <http://www.w3.org/2000/01/rdf-schema#label> ?label.
          ?subject rdf:type foaf:Person
          OPTIONAL

          { ?subject <http://schema.org/thumbnail> ?image }

          .
          }

          The query returns all persons including their label and, if it exists, an associated thumbnail. It works perfectly fine when written this way (both in 1.5.1 and 1.5.2.RC1). However, what we had specified is

          SELECT ?subject ?image ?label WHERE {
          OPTIONAL

          { ?subject <http://schema.org/thumbnail> ?image }

          .
          ?subject <http://www.w3.org/2000/01/rdf-schema#label> ?label.
          ?subject rdf:type foaf:Person
          }

          In the previous release, Blazegraph did rewrite the latter into the first query. Yet the new release doesn’t, because these two queries are indeed different: for the second query, we first extract all ?subject - ?image pairs in the database, if at least one such pair exists (that’s the semantics of OPTIONAL on the universal mapping, which we start out with when evaluating queries). Then we twice join this result — it is quite obvious that, in the final result, variable ?image will either be bound in every result (in case 1+ triples with predicate <http://schema.org/thumbnail> exist) or in no result (if no such triple exists). For the first query, however, variable ?image is conditionally bound. For short: the two queries are not equivalent.

          Show
          michaelschmidt michaelschmidt added a comment - We’ve recently been running into some “regressions” when switching from 1.5.1 to 1.5.2RC1 on our test systems, caused by the semantics changes mentioned in https://jira.blazegraph.com/browse/BLZG-1379 — in all cases where we encountered problems, the queries gave different results than in 1.5.1, caused by the fact that these queries were simply “wrong” accor4ding to the official semantics (and the problems could be solved by just reordering at triple pattern level). To quickly summarize again what’s happened: the join reordering that we implemented now strictly follows the official SPARQL semantics, whereas before there were some implicit (and wrong) assumptions that lead to inconsistent behaviour. The main (yet not only) “assumption” that changed is that it is possible to freely re-arrange statement patterns across the border of OPTIONALs. More precisely, the old Blazegraph approach was to always move all simple statement patterns in front of all OPTIONALs (which is invalid in general). The new approach does this only if it is valid, i.e. if holds that sequential execution gives us the same result (which is often, though not always, the case). To illustrate the problem by example, here’s a query we recently ran into problems with: SELECT ?subject ?image ?label WHERE { ?subject < http://www.w3.org/2000/01/rdf-schema#label > ?label. ?subject rdf:type foaf:Person OPTIONAL { ?subject <http://schema.org/thumbnail> ?image } . } The query returns all persons including their label and, if it exists, an associated thumbnail. It works perfectly fine when written this way (both in 1.5.1 and 1.5.2.RC1). However, what we had specified is SELECT ?subject ?image ?label WHERE { OPTIONAL { ?subject <http://schema.org/thumbnail> ?image } . ?subject < http://www.w3.org/2000/01/rdf-schema#label > ?label. ?subject rdf:type foaf:Person } In the previous release, Blazegraph did rewrite the latter into the first query. Yet the new release doesn’t, because these two queries are indeed different: for the second query, we first extract all ?subject - ?image pairs in the database, if at least one such pair exists (that’s the semantics of OPTIONAL on the universal mapping, which we start out with when evaluating queries). Then we twice join this result — it is quite obvious that, in the final result, variable ?image will either be bound in every result (in case 1+ triples with predicate < http://schema.org/thumbnail > exist) or in no result (if no such triple exists). For the first query, however, variable ?image is conditionally bound. For short: the two queries are not equivalent.

            People

            • Assignee:
              michaelschmidt michaelschmidt
              Reporter:
              michaelschmidt michaelschmidt
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: