Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-641 Improve load performance
  3. BLZG-1692

Concurrent writers on B+Tree (and possibly HTree)



    • Type: Sub-task
    • Status: Open
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      If we push down a set of keys over which we map an operation into the BTree, then we could make progress against resolving pages for those keys any time we need to do an IO to read a page for a writer. This would give us N thread read-ahead. In fact, that read-ahead code already exists in BTree.loadChild().

      The general approach is to traverse the index using a read-only thread (thread does not do any mutation). Once we reach the leaf where a potential mutation will be applied, we make the decision whether or not to modify the leaf (this is exactly the same decision that drives copy-on-write). If we will modify the leaf, then we take a write lock for the leaf. If the modification would split or join the parent, then we escalate to the write lock for the parent recursively until we reach the lower ancestor where the mutation would not cause a structural change. (Note that we might need a smaller scoped lock to maintain the #of spanned tuples in the nodes.)

      This policy should allow maximum parallelism in mutations except when the updates would be clustered into the same leaf. In such scenarios, which would include the common case of pure append and as well as the case of an index with good locality such as SPO(C), we would like to use the same thread for all mutations entering the same leaf.

      Some related logic was already introduced in the parallel read-ahead code. It examines the indexOf the keys to make thread assignment decisions. Other strategy would be to assign the actual mutations onto a striped queue. Each queue would be a mutation thread, so the operation would be handed off from a read-only thread to a mutation thread at the time that we decide to actually modify the index. The mutation threads would be chosen to have good locality and could even priority heap sort their work.


          Issue Links



              bryanthompson bryanthompson
              bryanthompson bryanthompson
              0 Vote for this issue
              1 Start watching this issue