Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-641 Improve load performance
  3. BLZG-1663

Reduce commit latency by parallelizing delete block processing




      The aggregate commit time break down is below for ~ 4B triples in a spatial data set.

      • We spend 3x more time flushing the indices into the write cache (notifyCommitters = 115s) as we do flushing the write cache to the disk (flushWriteSet = 38s).
      • 77 seconds just syncing the disk (simpleCommitSecs, includes RWStore postCommit(), but that should not touch the disk).
      • 126 seconds writing the commit record (this is also processing the delete blocks, and that is probably where most of the time is going).

      Take aways:

      1. The index flush to the write set is slower than the write cache flush to the disk. So index eviction is a bottleneck here. We might be able to do something more intelligent about gathering together the dirty pages to be flushed and organizing them in terms of index locality and even doing evictions by B+Tree level in parallel (we have to evict from the leaves up, but we could evict all leaves in parallel).

      2. Parallelizing the delete block processing would be a big win.

      0.00 / Journal / commit / commit2PhaseSecs
      38.13 / Journal / commit / flushWriteSetSecs
      0.00 / Journal / commit / gatherSecs
      115.42 / Journal / commit / notifyCommittersSecs
      0.00 / Journal / commit / prepare2PhaseSecs
      77.65 / Journal / commit / simpleCommitSecs
      356.81 / Journal / commit / totalCommitSecs
      125.61 / Journal / commit / writeCommitRecordSecs

      private static class CommitCounters implements ICounterSetAccess {

      • Elapsed nanoseconds for the {@link ICommitter#handleCommit(long)}
      • (flushing dirty pages from the indices into the write cache service).
        private final CAT elapsedNotifyCommittersNanos = new CAT();
      • Elapsed nanoseconds for {@link CommitState#writeCommitRecord()}


      • Note: This is also responsible for recycling the deferred frees for
      • {@link IHistoryManager}

        private final CAT elapsedWriteCommitRecordNanos = new CAT();

      • Elapsed nanoseconds for flushing the write set from the write cache
      • service to the backing store (this is the bulk of the disk IO unless
      • the write cache service fills up during a long running commit, in
      • which case there is also incremental eviction).
        private final CAT elapsedFlushWriteSetNanos = new CAT();
      • Elapsed nanoseconds for the simple atomic commit (non-HA). This
      • consists of sync'ing the disk (iff double-sync is enabled), writing
      • the root block, and then sync'ing the disk.
        private final CAT elapsedSimpleCommitNanos = new CAT();
      • Elapsed nanoseconds for the entire commit protocol.
        private final CAT elapsedTotalCommitNanos = new CAT();

      // HA counters


      • Elapsed nanoseconds for GATHER (consensus release time protocol : HA
      • only).
        private final CAT elapsedGatherNanos = new CAT();
      • Elapsed nanoseconds for PREPARE (2-phase commit: HA only).
        private final CAT elapsedPrepare2PhaseNanos = new CAT();
      • Elapsed nanoseconds for COMMIT2PHASE (2-phase commit: HA only).
        private final CAT elapsedCommit2PhaseNanos = new CAT();




            martyncutcher martyncutcher
            bryanthompson bryanthompson
            0 Vote for this issue
            4 Start watching this issue