- We should use one thread-per index to reduce time.
- We should support warm-up of all indices or warm-up of those associated with a bigdata namespace (kb.*).
The method that I intend to reuse is the following. See DumpJournal for an example of its use. However, this method reads all pages of an index, not just the non-leaf nodes. It should be parameterized to support reading of just the non-leaf nodes.
final BaseIndexStats stats = ndx.dumpPages(dumpPages);
If we pass in the current level during the index scan then we can conditionally recurse to the next level of a B+Tree iff we know that the next level will have non-leaf nodes. This is knowable for the B+Tree because it is a balanced tree and the leaves always appear at the same depth for along any path from the root (for a given snapshot of the tree).
// normal read following the node hierarchy, using cache, etc.
final AbstractNode<?> child = ((Node) node).getChild(i);
// recursive dump
dumpPages(ndx, child, stats);
However, this is not true of the HTree. The HTree is not a balanced tree and we do not think that we have any mechanism to tell whether the child is a leaf or not based on inspection of the parent directory page. In fact, we are not using durable HTree indices for the triple/quad store at this time (just for the analytic query mode) so this is not really a pressing concern. And even if we had to warm up HTree indices we could always just scan the whole thing.
Another approach to a warm up protocol would be to do breadth first reads of the index pages. So, we read the root. Then we use a pool of threads to read all immediate (non-leaf) children of the root. Then recursively descend in breadth first steps. The thread pool would obviously need to be shared and bounded. This approach could be used for a single index or for all indices. But I think that the simple approach of using one thread-per index and scanning the indices in parallel is probably sufficient. As long we we do not force the B+Tree leaf reads then it should do a very good job up warming up the indices. There will still be IO latency involved when we do queries, but not more than 1 IO per leaf on average.