Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1304

Symmetric hash join operator (non-blocking)

    Details

      Description

      As we're making more and more use of hash joins (e.g. when projecting in DISTINCT variables into subgroups), it might be nice to have a pipelined version of the hash join operator.

      Candidates might be, for instance,

      We need to have a closer look at what’s possible here and what the specific requirements are, maybe there are also alternatives here (non-hash based). Possibly a merge join would also be a better choice: when computing the DISTINCT projection we could sort right away, passing the distinct projection in sorted as well, and taking care not to destroy order.

      One problem in implementing this strategy is concurrency of such collections. Obviously there need to be synchronization barriers around mutation. The HTree is only thread-safe for concurrent readers. If there is a writer, then there must not be a concurrent reader. The current solution set hash join checkpoints the HTree once all solutions have been accepted. Once the index has been checkpointed the readers obtain a read-only view from the checkpoint.

        Issue Links

          Activity

          Hide
          michaelschmidt michaelschmidt added a comment -

          See also BLZG-657, as well as the paper on eddies at http://db.cs.berkeley.edu/papers/sigmod00-eddy.pdf.

          Show
          michaelschmidt michaelschmidt added a comment - See also BLZG-657 , as well as the paper on eddies at http://db.cs.berkeley.edu/papers/sigmod00-eddy.pdf .
          Hide
          michaelschmidt michaelschmidt added a comment - - edited

          What's missing:

          • Test cases -> done
          • Function to decide when to use pipelined version -> done
          • Query hint to enable selectively -> done
          • Analytic mode version of operator -> created dedicated ticket for this, see BLZG-1608
          • Minor code cleanup (revert unneeded changes) -> done
          • Benchmarking
            • Verify that it helps -> done
            • Come up with few benchmark queries that capture the benefits (SP2B extensions) -> done
            • Assess general performance of pipelined hash join vs. old hash join -> in progress (benchmark running)
          Show
          michaelschmidt michaelschmidt added a comment - - edited What's missing: Test cases -> done Function to decide when to use pipelined version -> done Query hint to enable selectively -> done Analytic mode version of operator -> created dedicated ticket for this, see BLZG-1608 Minor code cleanup (revert unneeded changes) -> done Benchmarking Verify that it helps -> done Come up with few benchmark queries that capture the benefits (SP2B extensions) -> done Assess general performance of pipelined hash join vs. old hash join -> in progress (benchmark running)
          Hide
          michaelschmidt michaelschmidt added a comment -

          All done, merged in. See google sheets for benchmark results.

          Show
          michaelschmidt michaelschmidt added a comment - All done, merged in. See google sheets for benchmark results.

            People

            • Assignee:
              michaelschmidt michaelschmidt
              Reporter:
              michaelschmidt michaelschmidt
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: