Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-535

Optimize hash joins when there are no source solutions (or only the exogenous bindings)

    Details

      Description

      Join variables are currently set based on the incoming bound variables to the subgroup and the definitely bound variables in the subgroup. When a subgroup will run first in the query, it has no incoming bound variables. This causes it to build a hash index with no join variables. This leads to hash joins without join variables, which are VERY expensive.

      The reason why we can not specify join variables based on the definitely bound variables in the sub-group is that the hash code of the exogenous/empty solution will be typically be undefined since it will not have bindings for the join variables.

      We need to recognize this case and handle it differently. Rather than reducing the join variables to an empty set, the join variables should be all definitely bound variables and we should INCLUDE the solution set into the parent using a different operator.

      Basically, all the operator needs to do is drain the solutions from the hash index, pushing them into the pipeline (its sink). It can not do a hash join against the source solution (either empty or containing just the exogenous bindings) because there is no guarantee that the source solution will share the join variables (in fact, it nearly always will NOT share the join variables). However, the source solution has very low cardinality (ONE).

      Therefore, for each solution in the hash index, it attempts a join with each source solution in turn. While this is conceptually a cross product unconstrained by the presence of the join variables, in fact there is only one source solution so this amounts to a single scan of the hash index in which we possibly pickup and/or filter based on the exogenous bindings.

      This operation is currently executed by a (JVM|HTree)SolutionSetHashJoin. Perhaps the easiest thing would be to add an annotation to that hash join indicating that it should IGNORE the join variables and do a full (1 x M) cross product.

        Activity

        beebs Brad Bebee created issue -
        beebs Brad Bebee made changes -
        Field Original Value New Value
        Workflow Trac Import v2 [ 12371 ] Trac Import v3 [ 13951 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v3 [ 13951 ] Trac Import v4 [ 15280 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v4 [ 15280 ] Trac Import v5 [ 16666 ]
        beebs Brad Bebee made changes -
        Labels Issue_patch_20150625
        beebs Brad Bebee made changes -
        Status Closed - Won't Fix [ 6 ] Open [ 1 ]
        beebs Brad Bebee made changes -
        Status Open [ 1 ] Accepted [ 10101 ]
        beebs Brad Bebee made changes -
        Status Accepted [ 10101 ] In Progress [ 3 ]
        beebs Brad Bebee made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        beebs Brad Bebee made changes -
        Status Resolved [ 5 ] In Review [ 10100 ]
        beebs Brad Bebee made changes -
        Resolution Fixed [ 1 ] Done [ 10000 ]
        Status In Review [ 10100 ] Done [ 10000 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v5 [ 16666 ] Trac Import v6 [ 17901 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v6 [ 17901 ] Trac Import v7 [ 19298 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v7 [ 19298 ] Trac Import v8 [ 20919 ]

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: