Changes to PipelineJoin, IBindingSetAccessPath, IHashJoinUtility,
JVMHashJoinUtility, HTreeHashJoinUtility, ServiceCallOp, etc. intended
to reduce re-chunking during vectored query evaluation. This change
provides correct accounting for chunksIn and unitsIn for the solution
set hash join and improved memory utilization through elimination of
some unnecessary dechunking and rechunking in the hash join API.
Changes were also made to the query hint infrastructure in order to
ensure that hint:chunkSize was correctly applied to HTree
operators. This is important now that the HTreeHashJoinUtility no
longer rechunks to a hard-coded chunkSize of 1000.
While working on this, I noticed that hint:chunkSize was not making it
onto the SliceOp, onto the Predicate associated with a PipelineJoin
(where it controls the vectoring for reading on the access path),
etc. This was due to both the types of nodes to which the IQueryHint
implementations were willing to apply themselves and the types of
nodes to which the ASTQueryHintOptimizer was willing to apply query
hints. I have made both more expansive.
To help analyze the issue with query hints (which was necessary to
assess the performance impact of my change to vectoring in the
HTreeHashJoinUtility) and to help analyze the problem with the
stochastic behavior of the HTree, I also added significantly more
information into the predSummary and added columns (in the detailed
explain mode) for the PipelineOp and Predicate annotations.
I also touched a lot of classes, adding @Override and final
annotations. A number of this were in the striterator package (when I
added a CloseableChunkedIteratorWrapperConverter to address the
rechunking pattern in HTreeHashJoinUtility, IBindingSetAccessPath, and
BOpContext#solutions()) and in the ast package. Several of join
operators were also touched, either to support the fix of the
rechunking pattern or to eliminate some old (and commented out) code.
I have added more unit tests of the query hints mechanisms. I found
and fixed several places where query hints were not being applied when
generating pipeline operators (AST2BOpUtility) and where query hints
were not being copied from one AST node to another when making
structural changes to the AST, e.g., ASTBottomUpOptimizer and
ASTSparql11SubqueryOptimizer both create a NamedSubqueryRoot and a
NamedSubqueryInclude. However, they were not copying across the query
hints from the parent join group (ASTBottomUpOptimizer) and the
original SubqueryRoot (ASTSparql11SubqueryOptimizer).
- ASTBase: javadoc on Annotations.QUERY_HINTS. final and @Override annotations.
- AssignmentNode: final and @Override annotations.
- GraphPatternGroup: final and @Override annotations.
- GraphNodeGroup: license header, final and @Override annotations, toString(int) changes.
- JoinGroupNode: license header. Made the OPTIMIZER property explicit. Added getQueryOptimizer() method. final and @Override annotations.
- QueryBase: added query hints into toString(int).
- QueryHints: referenced ticket
BLZG-162 (clean up query hints).
- QueryOptimizerEnum: removed dead code.
- QueryRoot: final, @Override, and javadoc correction.
- SliceNode: javadoc on annotations, toString(int) now shows the query hints. this was done to have visibility into vectoring.
- StatementPatternNode: license header, final and @Override, javadoc, dead code elimination.
- ValueExpressionListBaseNode: @Override
- CloseableChunkedIteratorWrapperConverter: new class converts from IChunkedIterator<E> to ICloseableIterator visiting E.
- TestCloseableChunkedIteratorWrapperConverter: new test suite.
- TestAll: include new test class.
- AbstractChunkedResolverator: final, @Override.
- ChunkedArrayIterator: final, @Override.
- ChunkedArraysIterator: javadoc, final, @Override.
- ChunkedConvertingIterator: final, @Override.
- ChunkedResolvingIterator: javadoc, final, @Override.
- ChunkedWrappedIterator: final, @Override.
- Chunkerator: close() now tests for ICloseable rather than ICloseableIterator.
- CloseableIteratorWrapper: final, @Override.
- Dechunkerator: javadoc, final, @Override, close() now tests for ICloseable rather than ICloseableIterator.
- DelegateChunkedIterator: final, @Override.
- GenericChunkedStriterator: removed unnecessary @SuppressWarning.
- IChunkedIterator: @Override for methods declared by Iterator.
- IChunkedStriterator: @Override for methods in base interface.
- MergeFilter: final, @Override.
- PushbackIterator: final, @Override.
- Resolver: final, @Override.
- Striterator: final, @Override.
- TestPipelineJoin: @Override and super.setUp() / super.tearDown().
- AbstractHashJoinUtilityTestCase: Modified how we execute the hash join for the IHashJoinUtility API change.
- HashIndexOp: @Override
- HashJoinOp: @Override, javadoc, API change for IHashJoinUtility.hashJoin().
- HTreeHashIndexOp: @Override, final, dead code eliminated.
- HTreeHashJoinUtility: removed static chunkSize field. Vectoring is now controlled by hint:chunkSize. Pushed down the logic to track chunskIn and unitsIn for hashJoin2 (they were not being tracked).
- IHashJoinUtility: API change to remove rechunking pattern.
- JoinVariableNotBoundException: final annotations.
- JVMHashIndex: Slight efficiency change in makeKey(). @Override
- JVMHashIndexOp: final annotation.
- JVMHashJoinUtility: API change for hashJoin2(): now accepts ICloseableIteratoe<IBindingSet> and BOpStats and tracks unitsIn and chunksIn.
- PipelineJoin: IBindingSetAccessPath now returns an ICloseableIterator<IBindingSet>. Modified the AccessPathTask to handle the IBindingSet chunks. Used to be just IBindingSets.
- SolutionSetHashJoin: Modified to pass context.getSource() and BOpStats into the IHashJoinUtility rather than dechunking.
- HTreeNamedSubqueryOp, JVMNamedSubqueryOp, and INamedSubqueryOp: Added INamedSubqueryOp as a marker interface for the two implementation classes so we can identify those operators when they appear in a query plan.
- ServiceCallJoin: Modified to pass ICloseableIterator<IBindingSet> into IHashJoinUtility. This should be pushed down further. There is a TODO to do that when we address vectoring in/out of the SERVICE operator.
- QueryLog: Significantly expanded and improved performance counter reporting for Explain, especially in the "detail" mode.
- AbstractAccessPathOp: removed unused methods. This is part of the query hints cleanup.
- BOpContext#solutions() was modified to return an ICloseableIterator visiting IBindingSets. This is part of the rechunking change for IHashJoinUtility.
- BOpUtility: Added getOnly() method used to obtain the only instance of a BOp from a query plan or AST. This is used by unit tests.
- IBindingSetAccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSets.
- AccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSets.
- Extensive changes to clean up query hints. Many query hints are now apply to IQueryNode rather than IJoinNode. javadoc. Test cases have been expanded.
- ASTBottomUpOptimizer: modified to pass through query hints from the parent join group to the lifted out named subquery and the INCLUDE.
- ASTSparql11SubqueryOptimizer: modified to pass through query hints from the original subquery when it is lifted out into a named subquery. The hints are applied to both the new named subquery and the new named subquery include.
- ASTStaticJoinOptimizer: changes to isStaticOptimizer() to use JoinGroup.getQueryOptimizer();
- ASTQueryHintOptimizer: extensive changes. modified how the optimizer identifies the nodes for which it will delegate to a query hint
- it used to only do this for QueryNodeBase, which was too restrictive. It now does this for everthing but value expressions. javadoc clarifying the intention of the class. Removed some dead code.
Note: ASTQueryHintOptimizer No longer adds all query hints with Scope:=Query to AST2BOpContext.queryHints. I have clarified this in documentation in both classes.
- AST2BOpBase: Changed the pattern for applyQueryHints() to use both the AST node's query hints and the AST2BOpUtility global defaults whenever possible.
- AST2BOpUtility: Modified to more systematically pass along query hints from the AST node to the constructed PipelineOps. Modified to pass through BufferAnnotations and IPredicate annotations to the Predicate created from a StatementPatternNode. This allows us to control vectoring for AccessPath reads.
- AST2BOpFilters: Partially addressed pass through of query hints, but only those in the global scope. We need to change the method interfaces to pass through the bounding AST node or the queryHints for that AST node.
- AST2BOpContext: license header, javadoc.
I have run through all of the bop, AST, SPARQL, and NSS test suites
and everything is green.
- (*) I have not yet resolved the HTree stochastic behavior. I will
continue to look at that once this checkpoint is committed. See
- (*) Query hints for the materialization pipeline
(ChunkedMaterializationOp, ConditionalRoutingOp) are not being
applied correctly because the caller's AST node (or its query
hints) are not being passed down. I am going to defer fixing this
for a moment while I look at the RTO integration. (The RTO needs
to be able to use AST2BOpJoins#join(), which is the main entry
point into the materialization pipeline code.) See BLZG-162.
- AST2BOpBase: Once we fix this, we really do not need to pass in
the AST2BOpContext's query hints into applyQueryHints() any
more. The impact will be achieved by passing down the query
hints from the appropriate bounding AST node. The
ASTQueryHintOptimizer is responsible for making sure that the
query hints are applied to those AST nodes.
- (*) Check query performance on LUBM U50, BSBM 100M, and govtrack.
The changes in the HTree vectoring could hit govtrack, but we
should be able to override hint:chunkSize if necessary to correct
for this (or automatically increase the chunkSize for the analytic
query mode, or use dynamic rechunking, etc). The changes in the
ASTQueryHintsOptimizer could hit all queries, but I was pretty
careful and do not expect to see any performance regressions.
See http://sourceforge.net/apps/trac/bigdata/ticket/483 (Eliminate unnecessary dechunking and rechunking)
See http://sourceforge.net/apps/trac/bigdata/ticket/791 (Clean up query hints)
See http://sourceforge.net/apps/trac/bigdata/ticket/763 (Stochastic results with Analytic Query Mode)
Numerous cleanups to query hint processing and the Explain mode were made to support analysis of this issue. The HTreeHashJoinUtility was modified to avoid re-chunking of intermediate solution sets. The root cause of the stochastic behavior has not yet been resolved. I will add more about this soon. See
Committed revision r7712.