Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1304

Symmetric hash join operator (non-blocking)

    Details

      Description

      As we're making more and more use of hash joins (e.g. when projecting in DISTINCT variables into subgroups), it might be nice to have a pipelined version of the hash join operator.

      Candidates might be, for instance,

      We need to have a closer look at what’s possible here and what the specific requirements are, maybe there are also alternatives here (non-hash based). Possibly a merge join would also be a better choice: when computing the DISTINCT projection we could sort right away, passing the distinct projection in sorted as well, and taking care not to destroy order.

      One problem in implementing this strategy is concurrency of such collections. Obviously there need to be synchronization barriers around mutation. The HTree is only thread-safe for concurrent readers. If there is a writer, then there must not be a concurrent reader. The current solution set hash join checkpoints the HTree once all solutions have been accepted. Once the index has been checkpointed the readers obtain a read-only view from the checkpoint.

        Issue Links

          Activity

          michaelschmidt michaelschmidt created issue -
          Hide
          michaelschmidt michaelschmidt added a comment -

          See also BLZG-657, as well as the paper on eddies at http://db.cs.berkeley.edu/papers/sigmod00-eddy.pdf.

          Show
          michaelschmidt michaelschmidt added a comment - See also BLZG-657 , as well as the paper on eddies at http://db.cs.berkeley.edu/papers/sigmod00-eddy.pdf .
          beebs Brad Bebee made changes -
          Field Original Value New Value
          Workflow Trac Import v2 [ 13140 ] Trac Import v3 [ 13315 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v3 [ 13315 ] Trac Import v4 [ 14644 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v4 [ 14644 ] Trac Import v5 [ 15995 ]
          bryanthompson bryanthompson made changes -
          Link This issue is duplicated by BLZG-1355 [ BLZG-1355 ]
          bryanthompson bryanthompson made changes -
          Link This issue relates to BLZG-657 [ BLZG-657 ]
          bryanthompson bryanthompson made changes -
          Summary Pipelined hash join operator Symmetric hash join operator (non-blocking)
          bryanthompson bryanthompson made changes -
          Priority Medium [ 3 ] High [ 2 ]
          bryanthompson bryanthompson made changes -
          Fix Version/s BLAZEGRAPH_RELEASE_1_5_3 [ 10165 ]
          bryanthompson bryanthompson made changes -
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v5 [ 15995 ] Trac Import v6 [ 17456 ]
          bryanthompson bryanthompson made changes -
          Link This issue relates to BLZG-1364 [ BLZG-1364 ]
          bryanthompson bryanthompson made changes -
          Link This issue relates to BLZG-1357 [ BLZG-1357 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v6 [ 17456 ] Trac Import v7 [ 18846 ]
          beebs Brad Bebee made changes -
          Fix Version/s BLAZEGRAPH_RELEASE_1_6_0 [ 10200 ]
          Fix Version/s BLAZEGRAPH_RELEASE_1_5_3 [ 10165 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v7 [ 18846 ] Trac Import v8 [ 20391 ]
          Hide
          michaelschmidt michaelschmidt added a comment - - edited

          What's missing:

          • Test cases -> done
          • Function to decide when to use pipelined version -> done
          • Query hint to enable selectively -> done
          • Analytic mode version of operator -> created dedicated ticket for this, see BLZG-1608
          • Minor code cleanup (revert unneeded changes) -> done
          • Benchmarking
            • Verify that it helps -> done
            • Come up with few benchmark queries that capture the benefits (SP2B extensions) -> done
            • Assess general performance of pipelined hash join vs. old hash join -> in progress (benchmark running)
          Show
          michaelschmidt michaelschmidt added a comment - - edited What's missing: Test cases -> done Function to decide when to use pipelined version -> done Query hint to enable selectively -> done Analytic mode version of operator -> created dedicated ticket for this, see BLZG-1608 Minor code cleanup (revert unneeded changes) -> done Benchmarking Verify that it helps -> done Come up with few benchmark queries that capture the benefits (SP2B extensions) -> done Assess general performance of pipelined hash join vs. old hash join -> in progress (benchmark running)
          bryanthompson bryanthompson made changes -
          Assignee bryanthompson [ bryanthompson ] michaelschmidt [ michaelschmidt ]
          bryanthompson bryanthompson made changes -
          Status Open [ 1 ] Accepted [ 10101 ]
          bryanthompson bryanthompson made changes -
          Status Accepted [ 10101 ] In Progress [ 3 ]
          Hide
          michaelschmidt michaelschmidt added a comment -

          All done, merged in. See google sheets for benchmark results.

          Show
          michaelschmidt michaelschmidt added a comment - All done, merged in. See google sheets for benchmark results.
          michaelschmidt michaelschmidt made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          michaelschmidt michaelschmidt made changes -
          Status Resolved [ 5 ] In Review [ 10100 ]
          michaelschmidt michaelschmidt made changes -
          Resolution Done [ 10000 ]
          Status In Review [ 10100 ] Done [ 10000 ]
          michaelschmidt michaelschmidt made changes -
          Component/s Query Plan Generator [ 10014 ]

            People

            • Assignee:
              michaelschmidt michaelschmidt
              Reporter:
              michaelschmidt michaelschmidt
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: