The index flush to the write set is slower than the write cache flush to the disk. So index eviction is a bottleneck here. We might be able to do something more intelligent about gathering together the dirty pages to be flushed and organizing them in terms of index locality and even doing evictions by B+Tree level in parallel (we have to evict from the leaves up, but we could evict all leaves in parallel).
Note: It might not be possible to parallelize this operation for the HTree in quite the same fashion. The HTree has a different concept of "depth" or "level" than the BTree. The requirement would be a means to identify dirty pages that are independent of one another. So first all leaf buckets, then all parents of those leaf buckets, and this continues recursively until we reach the top directory page.
Note: The strategy described at BLZG-1664 is not likely to reduce the commit bottleneck. Instead it is focused on reducing the average latency of AbstractBTree.touch() during writes since touch() drives evictions.
TODO This requires support for concurrent NodeSerializers. The existing class is only for a single writer. The code currently makes gratuitous NodeSerializer instances to support the parallelism and then drops them after each level. This is likely to create a lot of short term heap growth. This should be amortized by either having a pool for the life of the mutable index or some other technique such as only using multiple threads when we have a large number of pages being evicted at the same time (e.g., 10 or more).
TODO The maximum parallelism should be configurable (it is currently hackable from the environment, but that is not a long term solution) - see the AbstractHTree and AbstractBTree constructors (bottom of these methods).
TODO This also needs to be done for the HTree.
BLZG-1663 for more background on this.