Details

      Description

      Query hints are partly broken and need to be more thoroughly tested. The history on this is that query hints evolved rather late and have partly replace a previous mechanism that allowed magic predicates to be made visible without the QueryHintRegistry. The code therefore handles query hints correctly in some cases, but mainly those that receive enough attention.

      The test suites for query hints are incomplete in the following ways:


      - the hints covered (not all query hints are tested);
      - the nexting scopes of the SPARQL queries to which they are applied and the scope in which they are given;
      - the execution semantics
      - specifically marking sure that the query hints are


      - transferred onto the PipelineOp and Predicate data structures;


      - that the actual execution of the relevant PipelineOp and the AccessPaths reading from those Predicates correctly reflects the intended semantics of the query hint.


      - verifying the correct annotation of AST2BOPContext.queryHints
      - for some query hints (such as the ChunkSizeHint), the localName of the as given query hint is not the same value that is actually recorded on AST2BOpContext.queryHints.

      In particular, problems have been identified with:


      - ChunkSizeHint: failure to set the correct name in the global scope on AST2BOpContext.queryHints.
      - Query hints for BufferAnnotation were not being set on the Predicate and thus were not being respected by the AccessPath, just by the PipelineOp.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Changes to PipelineJoin, IBindingSetAccessPath, IHashJoinUtility,
        JVMHashJoinUtility, HTreeHashJoinUtility, ServiceCallOp, etc. intended
        to reduce re-chunking during vectored query evaluation. This change
        provides correct accounting for chunksIn and unitsIn for the solution
        set hash join and improved memory utilization through elimination of
        some unnecessary dechunking and rechunking in the hash join API.

        Changes were also made to the query hint infrastructure in order to
        ensure that hint:chunkSize was correctly applied to HTree
        operators. This is important now that the HTreeHashJoinUtility no
        longer rechunks to a hard-coded chunkSize of 1000.

        While working on this, I noticed that hint:chunkSize was not making it
        onto the SliceOp, onto the Predicate associated with a PipelineJoin
        (where it controls the vectoring for reading on the access path),
        etc. This was due to both the types of nodes to which the IQueryHint
        implementations were willing to apply themselves and the types of
        nodes to which the ASTQueryHintOptimizer was willing to apply query
        hints. I have made both more expansive.

        To help analyze the issue with query hints (which was necessary to
        assess the performance impact of my change to vectoring in the
        HTreeHashJoinUtility) and to help analyze the problem with the
        stochastic behavior of the HTree, I also added significantly more
        information into the predSummary and added columns (in the detailed
        explain mode) for the PipelineOp and Predicate annotations.

        I also touched a lot of classes, adding @Override and final
        annotations. A number of this were in the striterator package (when I
        added a CloseableChunkedIteratorWrapperConverter to address the
        rechunking pattern in HTreeHashJoinUtility, IBindingSetAccessPath, and
        BOpContext#solutions()) and in the ast package. Several of join
        operators were also touched, either to support the fix of the
        rechunking pattern or to eliminate some old (and commented out) code.

        I have added more unit tests of the query hints mechanisms. I found
        and fixed several places where query hints were not being applied when
        generating pipeline operators (AST2BOpUtility) and where query hints
        were not being copied from one AST node to another when making
        structural changes to the AST, e.g., ASTBottomUpOptimizer and
        ASTSparql11SubqueryOptimizer both create a NamedSubqueryRoot and a
        NamedSubqueryInclude. However, they were not copying across the query
        hints from the parent join group (ASTBottomUpOptimizer) and the
        original SubqueryRoot (ASTSparql11SubqueryOptimizer).

        com.bigdata.rdf.sparql.ast::
        - ASTBase: javadoc on Annotations.QUERY_HINTS. final and @Override annotations.
        - AssignmentNode: final and @Override annotations.
        - GraphPatternGroup: final and @Override annotations.
        - GraphNodeGroup: license header, final and @Override annotations, toString(int) changes.
        - JoinGroupNode: license header. Made the OPTIMIZER property explicit. Added getQueryOptimizer() method. final and @Override annotations.
        - QueryBase: added query hints into toString(int).
        - QueryHints: referenced ticket BLZG-162 (clean up query hints).
        - QueryOptimizerEnum: removed dead code.
        - QueryRoot: final, @Override, and javadoc correction.
        - SliceNode: javadoc on annotations, toString(int) now shows the query hints. this was done to have visibility into vectoring.
        - StatementPatternNode: license header, final and @Override, javadoc, dead code elimination.
        - ValueExpressionListBaseNode: @Override

        com.bigdata.striterator::
        - CloseableChunkedIteratorWrapperConverter: new class converts from IChunkedIterator<E> to ICloseableIterator visiting E[].
        - TestCloseableChunkedIteratorWrapperConverter: new test suite.
        - TestAll: include new test class.
        - AbstractChunkedResolverator: final, @Override.
        - ChunkedArrayIterator: final, @Override.
        - ChunkedArraysIterator: javadoc, final, @Override.
        - ChunkedConvertingIterator: final, @Override.
        - ChunkedResolvingIterator: javadoc, final, @Override.
        - ChunkedWrappedIterator: final, @Override.
        - Chunkerator: close() now tests for ICloseable rather than ICloseableIterator.
        - CloseableIteratorWrapper: final, @Override.
        - Dechunkerator: javadoc, final, @Override, close() now tests for ICloseable rather than ICloseableIterator.
        - DelegateChunkedIterator: final, @Override.
        - GenericChunkedStriterator: removed unnecessary @SuppressWarning.
        - IChunkedIterator: @Override for methods declared by Iterator.
        - IChunkedStriterator: @Override for methods in base interface.
        - MergeFilter: final, @Override.
        - PushbackIterator: final, @Override.
        - Resolver: final, @Override.
        - Striterator: final, @Override.

        com.bigdata.bop.join::
        - TestPipelineJoin: @Override and super.setUp() / super.tearDown().
        - AbstractHashJoinUtilityTestCase: Modified how we execute the hash join for the IHashJoinUtility API change.
        - HashIndexOp: @Override
        - HashJoinOp: @Override, javadoc, API change for IHashJoinUtility.hashJoin().
        - HTreeHashIndexOp: @Override, final, dead code eliminated.
        - HTreeHashJoinUtility: removed static chunkSize field. Vectoring is now controlled by hint:chunkSize. Pushed down the logic to track chunskIn and unitsIn for hashJoin2 (they were not being tracked).
        - IHashJoinUtility: API change to remove rechunking pattern.
        - JoinVariableNotBoundException: final annotations.
        - JVMHashIndex: Slight efficiency change in makeKey(). @Override
        - JVMHashIndexOp: final annotation.
        - JVMHashJoinUtility: API change for hashJoin2(): now accepts ICloseableIteratoe<IBindingSet[]> and BOpStats and tracks unitsIn and chunksIn.
        - PipelineJoin: IBindingSetAccessPath now returns an ICloseableIterator<IBindingSet[]>. Modified the AccessPathTask to handle the IBindingSet[] chunks. Used to be just IBindingSets.
        - SolutionSetHashJoin: Modified to pass context.getSource() and BOpStats into the IHashJoinUtility rather than dechunking.

        com.bigdata.bop.controller::
        - HTreeNamedSubqueryOp, JVMNamedSubqueryOp, and INamedSubqueryOp: Added INamedSubqueryOp as a marker interface for the two implementation classes so we can identify those operators when they appear in a query plan.
        - ServiceCallJoin: Modified to pass ICloseableIterator<IBindingSet[]> into IHashJoinUtility. This should be pushed down further. There is a TODO to do that when we address vectoring in/out of the SERVICE operator.

        com.bigdata.bop.QueryEngine:
        - QueryLog: Significantly expanded and improved performance counter reporting for Explain, especially in the "detail" mode.

        com.bigdata.bop::
        - AbstractAccessPathOp: removed unused methods. This is part of the query hints cleanup.
        - BOpContext#solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s. This is part of the rechunking change for IHashJoinUtility.
        - BOpUtility: Added getOnly() method used to obtain the only instance of a BOp from a query plan or AST. This is used by unit tests.

        com.bigdata.relation.accesspath::
        - IBindingSetAccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s.
        - AccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s.

        com.bigdata.rdf.sparql.hints::
        - Extensive changes to clean up query hints. Many query hints are now apply to IQueryNode rather than IJoinNode. javadoc. Test cases have been expanded.

        com.bigdata.rdf.sparql.ast::
        - ASTBottomUpOptimizer: modified to pass through query hints from the parent join group to the lifted out named subquery and the INCLUDE.
        - ASTSparql11SubqueryOptimizer: modified to pass through query hints from the original subquery when it is lifted out into a named subquery. The hints are applied to both the new named subquery and the new named subquery include.
        - ASTStaticJoinOptimizer: changes to isStaticOptimizer() to use JoinGroup.getQueryOptimizer();
        - ASTQueryHintOptimizer: extensive changes. modified how the optimizer identifies the nodes for which it will delegate to a query hint
        - it used to only do this for QueryNodeBase, which was too restrictive. It now does this for everthing but value expressions. javadoc clarifying the intention of the class. Removed some dead code.

        Note: ASTQueryHintOptimizer No longer adds all query hints with Scope:=Query to AST2BOpContext.queryHints. I have clarified this in documentation in both classes.

        com.bigdata.rdf.sparql.eval::
        - AST2BOpBase: Changed the pattern for applyQueryHints() to use both the AST node's query hints and the AST2BOpUtility global defaults whenever possible.
        - AST2BOpUtility: Modified to more systematically pass along query hints from the AST node to the constructed PipelineOps. Modified to pass through BufferAnnotations and IPredicate annotations to the Predicate created from a StatementPatternNode. This allows us to control vectoring for AccessPath reads.
        - AST2BOpFilters: Partially addressed pass through of query hints, but only those in the global scope. We need to change the method interfaces to pass through the bounding AST node or the queryHints for that AST node.
        - AST2BOpContext: license header, javadoc.

        I have run through all of the bop, AST, SPARQL, and NSS test suites
        and everything is green.

        TODO:

          - (*) I have not yet resolved the HTree stochastic behavior.  I will
            continue to look at that once this checkpoint is committed. See
            BLZG-848.
        
          - (*) Query hints for the materialization pipeline
            (ChunkedMaterializationOp, ConditionalRoutingOp) are not being
            applied correctly because the caller's AST node (or its query
            hints) are not being passed down.  I am going to defer fixing this
            for a moment while I look at the RTO integration.  (The RTO needs
            to be able to use AST2BOpJoins#join(), which is the main entry
            point into the materialization pipeline code.)  See BLZG-162.
        
            - AST2BOpBase: Once we fix this, we really do not need to pass in
              the AST2BOpContext's query hints into applyQueryHints() any
              more. The impact will be achieved by passing down the query
              hints from the appropriate bounding AST node.  The
              ASTQueryHintOptimizer is responsible for making sure that the
              query hints are applied to those AST nodes.
        
          - (*) Check query performance on LUBM U50, BSBM 100M, and govtrack.
            The changes in the HTree vectoring could hit govtrack, but we
            should be able to override hint:chunkSize if necessary to correct
            for this (or automatically increase the chunkSize for the analytic
            query mode, or use dynamic rechunking, etc).  The changes in the
            ASTQueryHintsOptimizer could hit all queries, but I was pretty
            careful and do not expect to see any performance regressions.
        

        See http://sourceforge.net/apps/trac/bigdata/ticket/483 (Eliminate unnecessary dechunking and rechunking)
        See http://sourceforge.net/apps/trac/bigdata/ticket/791 (Clean up query hints)
        See http://sourceforge.net/apps/trac/bigdata/ticket/763 (Stochastic results with Analytic Query Mode)

        Numerous cleanups to query hint processing and the Explain mode were made to support analysis of this issue. The HTreeHashJoinUtility was modified to avoid re-chunking of intermediate solution sets. The root cause of the stochastic behavior has not yet been resolved. I will add more about this soon. See BLZG-162.

        Committed revision r7712.

        Show
        bryanthompson bryanthompson added a comment - Changes to PipelineJoin, IBindingSetAccessPath, IHashJoinUtility, JVMHashJoinUtility, HTreeHashJoinUtility, ServiceCallOp, etc. intended to reduce re-chunking during vectored query evaluation. This change provides correct accounting for chunksIn and unitsIn for the solution set hash join and improved memory utilization through elimination of some unnecessary dechunking and rechunking in the hash join API. Changes were also made to the query hint infrastructure in order to ensure that hint:chunkSize was correctly applied to HTree operators. This is important now that the HTreeHashJoinUtility no longer rechunks to a hard-coded chunkSize of 1000. While working on this, I noticed that hint:chunkSize was not making it onto the SliceOp, onto the Predicate associated with a PipelineJoin (where it controls the vectoring for reading on the access path), etc. This was due to both the types of nodes to which the IQueryHint implementations were willing to apply themselves and the types of nodes to which the ASTQueryHintOptimizer was willing to apply query hints. I have made both more expansive. To help analyze the issue with query hints (which was necessary to assess the performance impact of my change to vectoring in the HTreeHashJoinUtility) and to help analyze the problem with the stochastic behavior of the HTree, I also added significantly more information into the predSummary and added columns (in the detailed explain mode) for the PipelineOp and Predicate annotations. I also touched a lot of classes, adding @Override and final annotations. A number of this were in the striterator package (when I added a CloseableChunkedIteratorWrapperConverter to address the rechunking pattern in HTreeHashJoinUtility, IBindingSetAccessPath, and BOpContext#solutions()) and in the ast package. Several of join operators were also touched, either to support the fix of the rechunking pattern or to eliminate some old (and commented out) code. I have added more unit tests of the query hints mechanisms. I found and fixed several places where query hints were not being applied when generating pipeline operators (AST2BOpUtility) and where query hints were not being copied from one AST node to another when making structural changes to the AST, e.g., ASTBottomUpOptimizer and ASTSparql11SubqueryOptimizer both create a NamedSubqueryRoot and a NamedSubqueryInclude. However, they were not copying across the query hints from the parent join group (ASTBottomUpOptimizer) and the original SubqueryRoot (ASTSparql11SubqueryOptimizer). com.bigdata.rdf.sparql.ast:: - ASTBase: javadoc on Annotations.QUERY_HINTS. final and @Override annotations. - AssignmentNode: final and @Override annotations. - GraphPatternGroup: final and @Override annotations. - GraphNodeGroup: license header, final and @Override annotations, toString(int) changes. - JoinGroupNode: license header. Made the OPTIMIZER property explicit. Added getQueryOptimizer() method. final and @Override annotations. - QueryBase: added query hints into toString(int). - QueryHints: referenced ticket BLZG-162 (clean up query hints). - QueryOptimizerEnum: removed dead code. - QueryRoot: final, @Override, and javadoc correction. - SliceNode: javadoc on annotations, toString(int) now shows the query hints. this was done to have visibility into vectoring. - StatementPatternNode: license header, final and @Override, javadoc, dead code elimination. - ValueExpressionListBaseNode: @Override com.bigdata.striterator:: - CloseableChunkedIteratorWrapperConverter: new class converts from IChunkedIterator<E> to ICloseableIterator visiting E[]. - TestCloseableChunkedIteratorWrapperConverter: new test suite. - TestAll: include new test class. - AbstractChunkedResolverator: final, @Override. - ChunkedArrayIterator: final, @Override. - ChunkedArraysIterator: javadoc, final, @Override. - ChunkedConvertingIterator: final, @Override. - ChunkedResolvingIterator: javadoc, final, @Override. - ChunkedWrappedIterator: final, @Override. - Chunkerator: close() now tests for ICloseable rather than ICloseableIterator. - CloseableIteratorWrapper: final, @Override. - Dechunkerator: javadoc, final, @Override, close() now tests for ICloseable rather than ICloseableIterator. - DelegateChunkedIterator: final, @Override. - GenericChunkedStriterator: removed unnecessary @SuppressWarning. - IChunkedIterator: @Override for methods declared by Iterator. - IChunkedStriterator: @Override for methods in base interface. - MergeFilter: final, @Override. - PushbackIterator: final, @Override. - Resolver: final, @Override. - Striterator: final, @Override. com.bigdata.bop.join:: - TestPipelineJoin: @Override and super.setUp() / super.tearDown(). - AbstractHashJoinUtilityTestCase: Modified how we execute the hash join for the IHashJoinUtility API change. - HashIndexOp: @Override - HashJoinOp: @Override, javadoc, API change for IHashJoinUtility.hashJoin(). - HTreeHashIndexOp: @Override, final, dead code eliminated. - HTreeHashJoinUtility: removed static chunkSize field. Vectoring is now controlled by hint:chunkSize. Pushed down the logic to track chunskIn and unitsIn for hashJoin2 (they were not being tracked). - IHashJoinUtility: API change to remove rechunking pattern. - JoinVariableNotBoundException: final annotations. - JVMHashIndex: Slight efficiency change in makeKey(). @Override - JVMHashIndexOp: final annotation. - JVMHashJoinUtility: API change for hashJoin2(): now accepts ICloseableIteratoe<IBindingSet[]> and BOpStats and tracks unitsIn and chunksIn. - PipelineJoin: IBindingSetAccessPath now returns an ICloseableIterator<IBindingSet[]>. Modified the AccessPathTask to handle the IBindingSet[] chunks. Used to be just IBindingSets. - SolutionSetHashJoin: Modified to pass context.getSource() and BOpStats into the IHashJoinUtility rather than dechunking. com.bigdata.bop.controller:: - HTreeNamedSubqueryOp, JVMNamedSubqueryOp, and INamedSubqueryOp: Added INamedSubqueryOp as a marker interface for the two implementation classes so we can identify those operators when they appear in a query plan. - ServiceCallJoin: Modified to pass ICloseableIterator<IBindingSet[]> into IHashJoinUtility. This should be pushed down further. There is a TODO to do that when we address vectoring in/out of the SERVICE operator. com.bigdata.bop.QueryEngine: - QueryLog: Significantly expanded and improved performance counter reporting for Explain, especially in the "detail" mode. com.bigdata.bop:: - AbstractAccessPathOp: removed unused methods. This is part of the query hints cleanup. - BOpContext#solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s. This is part of the rechunking change for IHashJoinUtility. - BOpUtility: Added getOnly() method used to obtain the only instance of a BOp from a query plan or AST. This is used by unit tests. com.bigdata.relation.accesspath:: - IBindingSetAccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s. - AccessPath: solutions() was modified to return an ICloseableIterator visiting IBindingSet[]s. com.bigdata.rdf.sparql.hints:: - Extensive changes to clean up query hints. Many query hints are now apply to IQueryNode rather than IJoinNode. javadoc. Test cases have been expanded. com.bigdata.rdf.sparql.ast:: - ASTBottomUpOptimizer: modified to pass through query hints from the parent join group to the lifted out named subquery and the INCLUDE. - ASTSparql11SubqueryOptimizer: modified to pass through query hints from the original subquery when it is lifted out into a named subquery. The hints are applied to both the new named subquery and the new named subquery include. - ASTStaticJoinOptimizer: changes to isStaticOptimizer() to use JoinGroup.getQueryOptimizer(); - ASTQueryHintOptimizer: extensive changes. modified how the optimizer identifies the nodes for which it will delegate to a query hint - it used to only do this for QueryNodeBase, which was too restrictive. It now does this for everthing but value expressions. javadoc clarifying the intention of the class. Removed some dead code. Note: ASTQueryHintOptimizer No longer adds all query hints with Scope:=Query to AST2BOpContext.queryHints. I have clarified this in documentation in both classes. com.bigdata.rdf.sparql.eval:: - AST2BOpBase: Changed the pattern for applyQueryHints() to use both the AST node's query hints and the AST2BOpUtility global defaults whenever possible. - AST2BOpUtility: Modified to more systematically pass along query hints from the AST node to the constructed PipelineOps. Modified to pass through BufferAnnotations and IPredicate annotations to the Predicate created from a StatementPatternNode. This allows us to control vectoring for AccessPath reads. - AST2BOpFilters: Partially addressed pass through of query hints, but only those in the global scope. We need to change the method interfaces to pass through the bounding AST node or the queryHints for that AST node. - AST2BOpContext: license header, javadoc. I have run through all of the bop, AST, SPARQL, and NSS test suites and everything is green. TODO: - (*) I have not yet resolved the HTree stochastic behavior. I will continue to look at that once this checkpoint is committed. See BLZG-848. - (*) Query hints for the materialization pipeline (ChunkedMaterializationOp, ConditionalRoutingOp) are not being applied correctly because the caller's AST node (or its query hints) are not being passed down. I am going to defer fixing this for a moment while I look at the RTO integration. (The RTO needs to be able to use AST2BOpJoins#join(), which is the main entry point into the materialization pipeline code.) See BLZG-162. - AST2BOpBase: Once we fix this, we really do not need to pass in the AST2BOpContext's query hints into applyQueryHints() any more. The impact will be achieved by passing down the query hints from the appropriate bounding AST node. The ASTQueryHintOptimizer is responsible for making sure that the query hints are applied to those AST nodes. - (*) Check query performance on LUBM U50, BSBM 100M, and govtrack. The changes in the HTree vectoring could hit govtrack, but we should be able to override hint:chunkSize if necessary to correct for this (or automatically increase the chunkSize for the analytic query mode, or use dynamic rechunking, etc). The changes in the ASTQueryHintsOptimizer could hit all queries, but I was pretty careful and do not expect to see any performance regressions. See http://sourceforge.net/apps/trac/bigdata/ticket/483 (Eliminate unnecessary dechunking and rechunking) See http://sourceforge.net/apps/trac/bigdata/ticket/791 (Clean up query hints) See http://sourceforge.net/apps/trac/bigdata/ticket/763 (Stochastic results with Analytic Query Mode) Numerous cleanups to query hint processing and the Explain mode were made to support analysis of this issue. The HTreeHashJoinUtility was modified to avoid re-chunking of intermediate solution sets. The root cause of the stochastic behavior has not yet been resolved. I will add more about this soon. See BLZG-162 . Committed revision r7712.
        Hide
        bryanthompson bryanthompson added a comment -

        Performance is fine

        LUBM U50 (converged performance on bigdata11)

             [java] BIGDATA_SPARQL_ENDPOINT	#trials=10	#parallel=1
             [java] query	Time	Result#
             [java] query1	47	4
             [java] query3	31	6
             [java] query4	41	34
             [java] query5	36	719
             [java] query7	38	61
             [java] query8	178	6463
             [java] query10	29	0
             [java] query11	38	0
             [java] query12	36	0
             [java] query13	37	0
             [java] query14	1822	393730
             [java] query6	2039	430114
             [java] query2	617	130
             [java] query9	3970	8627
             [java] Total	8959	
        

        BSBM 100M (in HA mode)

        root@bigdata17:~/workspace/bsbmtools/trunk# for a in `seq 120`; do ./testdriver -o mt_16.xml -seed $RANDOM -w 50 -mt 16 -idir td_100m/td_data http://localhost:8090/sparql|grep QMpH;done;
        QMpH:                   22454.66 query mixes per hour
        QMpH:                   33612.80 query mixes per hour
        QMpH:                   39661.59 query mixes per hour
        QMpH:                   42356.65 query mixes per hour
        QMpH:                   43515.20 query mixes per hour
        QMpH:                   44649.31 query mixes per hour
        QMpH:                   44420.18 query mixes per hour
        QMpH:                   44927.01 query mixes per hour
        QMpH:                   44748.12 query mixes per hour
        QMpH:                   45699.67 query mixes per hour
        QMpH:                   45192.64 query mixes per hour
        QMpH:                   44484.85 query mixes per hour
        QMpH:                   44698.11 query mixes per hour
        QMpH:                   45655.74 query mixes per hour
        QMpH:                   44434.85 query mixes per hour
        QMpH:                   45143.41 query mixes per hour
        QMpH:                   46018.46 query mixes per hour
        QMpH:                   44760.61 query mixes per hour
        QMpH:                   46194.94 query mixes per hour
        QMpH:                   44919.06 query mixes per hour
        QMpH:                   46244.72 query mixes per hour
        QMpH:                   45388.53 query mixes per hour
        QMpH:                   45591.16 query mixes per hour
        QMpH:                   47103.84 query mixes per hour
        QMpH:                   44417.90 query mixes per hour
        QMpH:                   46441.07 query mixes per hour
        QMpH:                   47097.69 query mixes per hour
        QMpH:                   44416.09 query mixes per hour
        QMpH:                   46964.94 query mixes per hour
        QMpH:                   47295.34 query mixes per hour
        QMpH:                   44142.27 query mixes per hour
        QMpH:                   46736.92 query mixes per hour
        QMpH:                   47149.26 query mixes per hour
        QMpH:                   45604.62 query mixes per hour
        QMpH:                   46610.90 query mixes per hour
        QMpH:                   46859.56 query mixes per hour
        QMpH:                   47410.68 query mixes per hour
        QMpH:                   44976.61 query mixes per hour
        QMpH:                   46668.83 query mixes per hour
        QMpH:                   47331.05 query mixes per hour
        QMpH:                   47227.43 query mixes per hour
        QMpH:                   45224.23 query mixes per hour
        QMpH:                   46667.62 query mixes per hour
        QMpH:                   47277.58 query mixes per hour
        QMpH:                   47664.54 query mixes per hour
        QMpH:                   47716.29 query mixes per hour
        QMpH:                   44755.30 query mixes per hour
        QMpH:                   46851.73 query mixes per hour
        QMpH:                   47284.18 query mixes per hour
        QMpH:                   47403.51 query mixes per hour
        QMpH:                   47789.71 query mixes per hour
        QMpH:                   45210.70 query mixes per hour
        QMpH:                   46861.26 query mixes per hour
        QMpH:                   47541.81 query mixes per hour
        QMpH:                   47519.94 query mixes per hour
        QMpH:                   47984.05 query mixes per hour
        QMpH:                   48024.54 query mixes per hour
        QMpH:                   44478.36 query mixes per hour
        QMpH:                   46966.30 query mixes per hour
        QMpH:                   47474.17 query mixes per hour
        QMpH:                   47963.12 query mixes per hour
        QMpH:                   47981.22 query mixes per hour
        QMpH:                   47713.46 query mixes per hour
        QMpH:                   44200.51 query mixes per hour
        QMpH:                   46982.73 query mixes per hour
        QMpH:                   47398.14 query mixes per hour
        QMpH:                   47969.01 query mixes per hour
        QMpH:                   47772.97 query mixes per hour
        QMpH:                   48435.47 query mixes per hour
        QMpH:                   45130.71 query mixes per hour
        QMpH:                   46190.81 query mixes per hour
        QMpH:                   47602.85 query mixes per hour
        QMpH:                   47980.98 query mixes per hour
        QMpH:                   47721.26 query mixes per hour
        QMpH:                   48092.61 query mixes per hour
        QMpH:                   48100.53 query mixes per hour
        QMpH:                   48368.80 query mixes per hour
        QMpH:                   44103.57 query mixes per hour
        QMpH:                   47472.45 query mixes per hour
        QMpH:                   47792.85 query mixes per hour
        QMpH:                   47476.21 query mixes per hour
        QMpH:                   48359.27 query mixes per hour
        QMpH:                   48298.61 query mixes per hour
        QMpH:                   48178.89 query mixes per hour
        QMpH:                   47923.06 query mixes per hour
        QMpH:                   46257.95 query mixes per hour
        QMpH:                   47393.16 query mixes per hour
        QMpH:                   47392.17 query mixes per hour
        QMpH:                   47504.34 query mixes per hour
        QMpH:                   48040.61 query mixes per hour
        QMpH:                   47918.65 query mixes per hour
        QMpH:                   47853.98 query mixes per hour
        QMpH:                   47959.57 query mixes per hour
        QMpH:                   44085.25 query mixes per hour
        QMpH:                   46923.22 query mixes per hour
        QMpH:                   47408.39 query mixes per hour
        QMpH:                   47722.41 query mixes per hour
        QMpH:                   48169.48 query mixes per hour
        QMpH:                   48156.63 query mixes per hour
        QMpH:                   47907.23 query mixes per hour
        QMpH:                   48449.04 query mixes per hour
        QMpH:                   48411.64 query mixes per hour
        QMpH:                   44038.60 query mixes per hour
        QMpH:                   46976.02 query mixes per hour
        QMpH:                   47783.20 query mixes per hour
        QMpH:                   47851.05 query mixes per hour
        QMpH:                   47835.32 query mixes per hour
        QMpH:                   48016.69 query mixes per hour
        QMpH:                   48305.92 query mixes per hour
        QMpH:                   48360.17 query mixes per hour
        QMpH:                   48561.19 query mixes per hour
        QMpH:                   48476.29 query mixes per hour
        QMpH:                   43878.29 query mixes per hour
        QMpH:                   46979.88 query mixes per hour
        QMpH:                   47526.03 query mixes per hour
        QMpH:                   47769.46 query mixes per hour
        QMpH:                   47718.32 query mixes per hour
        QMpH:                   48035.97 query mixes per hour
        QMpH:                   48361.66 query mixes per hour
        QMpH:                   48435.06 query mixes per hour
        

        Still waiting on the govtrack results. That has to do with the change to eliminate rechunking and would only impact the analytic query mode.

        Show
        bryanthompson bryanthompson added a comment - Performance is fine LUBM U50 (converged performance on bigdata11) [java] BIGDATA_SPARQL_ENDPOINT #trials=10 #parallel=1 [java] query Time Result# [java] query1 47 4 [java] query3 31 6 [java] query4 41 34 [java] query5 36 719 [java] query7 38 61 [java] query8 178 6463 [java] query10 29 0 [java] query11 38 0 [java] query12 36 0 [java] query13 37 0 [java] query14 1822 393730 [java] query6 2039 430114 [java] query2 617 130 [java] query9 3970 8627 [java] Total 8959 BSBM 100M (in HA mode) root@bigdata17:~/workspace/bsbmtools/trunk# for a in `seq 120`; do ./testdriver -o mt_16.xml -seed $RANDOM -w 50 -mt 16 -idir td_100m/td_data http://localhost:8090/sparql|grep QMpH;done; QMpH: 22454.66 query mixes per hour QMpH: 33612.80 query mixes per hour QMpH: 39661.59 query mixes per hour QMpH: 42356.65 query mixes per hour QMpH: 43515.20 query mixes per hour QMpH: 44649.31 query mixes per hour QMpH: 44420.18 query mixes per hour QMpH: 44927.01 query mixes per hour QMpH: 44748.12 query mixes per hour QMpH: 45699.67 query mixes per hour QMpH: 45192.64 query mixes per hour QMpH: 44484.85 query mixes per hour QMpH: 44698.11 query mixes per hour QMpH: 45655.74 query mixes per hour QMpH: 44434.85 query mixes per hour QMpH: 45143.41 query mixes per hour QMpH: 46018.46 query mixes per hour QMpH: 44760.61 query mixes per hour QMpH: 46194.94 query mixes per hour QMpH: 44919.06 query mixes per hour QMpH: 46244.72 query mixes per hour QMpH: 45388.53 query mixes per hour QMpH: 45591.16 query mixes per hour QMpH: 47103.84 query mixes per hour QMpH: 44417.90 query mixes per hour QMpH: 46441.07 query mixes per hour QMpH: 47097.69 query mixes per hour QMpH: 44416.09 query mixes per hour QMpH: 46964.94 query mixes per hour QMpH: 47295.34 query mixes per hour QMpH: 44142.27 query mixes per hour QMpH: 46736.92 query mixes per hour QMpH: 47149.26 query mixes per hour QMpH: 45604.62 query mixes per hour QMpH: 46610.90 query mixes per hour QMpH: 46859.56 query mixes per hour QMpH: 47410.68 query mixes per hour QMpH: 44976.61 query mixes per hour QMpH: 46668.83 query mixes per hour QMpH: 47331.05 query mixes per hour QMpH: 47227.43 query mixes per hour QMpH: 45224.23 query mixes per hour QMpH: 46667.62 query mixes per hour QMpH: 47277.58 query mixes per hour QMpH: 47664.54 query mixes per hour QMpH: 47716.29 query mixes per hour QMpH: 44755.30 query mixes per hour QMpH: 46851.73 query mixes per hour QMpH: 47284.18 query mixes per hour QMpH: 47403.51 query mixes per hour QMpH: 47789.71 query mixes per hour QMpH: 45210.70 query mixes per hour QMpH: 46861.26 query mixes per hour QMpH: 47541.81 query mixes per hour QMpH: 47519.94 query mixes per hour QMpH: 47984.05 query mixes per hour QMpH: 48024.54 query mixes per hour QMpH: 44478.36 query mixes per hour QMpH: 46966.30 query mixes per hour QMpH: 47474.17 query mixes per hour QMpH: 47963.12 query mixes per hour QMpH: 47981.22 query mixes per hour QMpH: 47713.46 query mixes per hour QMpH: 44200.51 query mixes per hour QMpH: 46982.73 query mixes per hour QMpH: 47398.14 query mixes per hour QMpH: 47969.01 query mixes per hour QMpH: 47772.97 query mixes per hour QMpH: 48435.47 query mixes per hour QMpH: 45130.71 query mixes per hour QMpH: 46190.81 query mixes per hour QMpH: 47602.85 query mixes per hour QMpH: 47980.98 query mixes per hour QMpH: 47721.26 query mixes per hour QMpH: 48092.61 query mixes per hour QMpH: 48100.53 query mixes per hour QMpH: 48368.80 query mixes per hour QMpH: 44103.57 query mixes per hour QMpH: 47472.45 query mixes per hour QMpH: 47792.85 query mixes per hour QMpH: 47476.21 query mixes per hour QMpH: 48359.27 query mixes per hour QMpH: 48298.61 query mixes per hour QMpH: 48178.89 query mixes per hour QMpH: 47923.06 query mixes per hour QMpH: 46257.95 query mixes per hour QMpH: 47393.16 query mixes per hour QMpH: 47392.17 query mixes per hour QMpH: 47504.34 query mixes per hour QMpH: 48040.61 query mixes per hour QMpH: 47918.65 query mixes per hour QMpH: 47853.98 query mixes per hour QMpH: 47959.57 query mixes per hour QMpH: 44085.25 query mixes per hour QMpH: 46923.22 query mixes per hour QMpH: 47408.39 query mixes per hour QMpH: 47722.41 query mixes per hour QMpH: 48169.48 query mixes per hour QMpH: 48156.63 query mixes per hour QMpH: 47907.23 query mixes per hour QMpH: 48449.04 query mixes per hour QMpH: 48411.64 query mixes per hour QMpH: 44038.60 query mixes per hour QMpH: 46976.02 query mixes per hour QMpH: 47783.20 query mixes per hour QMpH: 47851.05 query mixes per hour QMpH: 47835.32 query mixes per hour QMpH: 48016.69 query mixes per hour QMpH: 48305.92 query mixes per hour QMpH: 48360.17 query mixes per hour QMpH: 48561.19 query mixes per hour QMpH: 48476.29 query mixes per hour QMpH: 43878.29 query mixes per hour QMpH: 46979.88 query mixes per hour QMpH: 47526.03 query mixes per hour QMpH: 47769.46 query mixes per hour QMpH: 47718.32 query mixes per hour QMpH: 48035.97 query mixes per hour QMpH: 48361.66 query mixes per hour QMpH: 48435.06 query mixes per hour Still waiting on the govtrack results. That has to do with the change to eliminate rechunking and would only impact the analytic query mode.
        Hide
        bryanthompson bryanthompson added a comment -

        There is a minor performance variation for govtrack with the commit above:

        before: 64.8958 minutes.
        after : 66.1127 minutes.

        This is within 1 part in 60 and is could be under the noise threshold. It is certainly not significant. People encountering performance impacts from this change can include the following query hint to vector all operators in the query at 1000 solutions per chunk. This value (or a larger value) is probably appropriate for the analytic query mode, especially if using the G1 garbage collector.

        hint:Query hint:chunkSize "1000".
        
        Show
        bryanthompson bryanthompson added a comment - There is a minor performance variation for govtrack with the commit above: before: 64.8958 minutes. after : 66.1127 minutes. This is within 1 part in 60 and is could be under the noise threshold. It is certainly not significant. People encountering performance impacts from this change can include the following query hint to vector all operators in the query at 1000 solutions per chunk. This value (or a larger value) is probably appropriate for the analytic query mode, especially if using the G1 garbage collector. hint:Query hint:chunkSize "1000".
        Hide
        bryanthompson bryanthompson added a comment -

        TODO: There are a number of places where we lift out named subqueries. These all need to be reviewed and the query hints correctly brought across for both the NamedSubqueryRoot and the NamedSubqueryInclude. Failure to address this prevents hints such as hint:atOnce from correctly being applied to all operators in the query plan.

        Show
        bryanthompson bryanthompson added a comment - TODO: There are a number of places where we lift out named subqueries. These all need to be reviewed and the query hints correctly brought across for both the NamedSubqueryRoot and the NamedSubqueryInclude. Failure to address this prevents hints such as hint:atOnce from correctly being applied to all operators in the query plan.
        Hide
        bryanthompson bryanthompson added a comment -

        Refactored AST2BOpUtility, AST2BOpFilters, and AST2BOpJoins to pass down the query hints as a Properties object from the dominating AST node. This causes query hints to be propagated to conditional routing operators, chunked materialization operators, etc. It also prepares the code for reuse by the RTO.

        I relabeled the "addMaterialization()" methods as 1, 2, and 3. This makes it significantly easier to identify the recursion patterns in the call hierarchy.

        I reorganized the method signatures to be more consistent in terms of where the AST2BOpContext and query hints appear in the list of arguments.

        Change to QueryLog to set pred==null for the summary line to get rid of unwanted details.

        The AST2BOpContext.queryHints field is now correctly ignored by AST2BOpBase.applyQueryHints(). The semantics of the AST2BOpContext.queryHints have already been applied to the AST nodes by the ASTQueryHintOptimizer. They do not need to be reapplied in applyQueryHints().

        The test suites for the AST and SPARQL are green.

        I have confirmed that the atOnce and chunkSize query hints are still be propagated correctly.

        TODO: The only remaining issue that I am aware of for query hints is that there are a number of places where we lift out named subqueries. These all need to be reviewed and the query hints correctly brought across for both the NamedSubqueryRoot? and the NamedSubqueryInclude?. Failure to address this prevents hints such as hint:atOnce from correctly being applied to all operators in the query plan.

        Committed revision r7732.

        Show
        bryanthompson bryanthompson added a comment - Refactored AST2BOpUtility, AST2BOpFilters, and AST2BOpJoins to pass down the query hints as a Properties object from the dominating AST node. This causes query hints to be propagated to conditional routing operators, chunked materialization operators, etc. It also prepares the code for reuse by the RTO. I relabeled the "addMaterialization()" methods as 1, 2, and 3. This makes it significantly easier to identify the recursion patterns in the call hierarchy. I reorganized the method signatures to be more consistent in terms of where the AST2BOpContext and query hints appear in the list of arguments. Change to QueryLog to set pred==null for the summary line to get rid of unwanted details. The AST2BOpContext.queryHints field is now correctly ignored by AST2BOpBase.applyQueryHints(). The semantics of the AST2BOpContext.queryHints have already been applied to the AST nodes by the ASTQueryHintOptimizer. They do not need to be reapplied in applyQueryHints(). The test suites for the AST and SPARQL are green. I have confirmed that the atOnce and chunkSize query hints are still be propagated correctly. TODO: The only remaining issue that I am aware of for query hints is that there are a number of places where we lift out named subqueries. These all need to be reviewed and the query hints correctly brought across for both the NamedSubqueryRoot? and the NamedSubqueryInclude?. Failure to address this prevents hints such as hint:atOnce from correctly being applied to all operators in the query plan. Committed revision r7732.
        Hide
        bryanthompson bryanthompson added a comment -

        Per above, this is mostly done.

        Show
        bryanthompson bryanthompson added a comment - Per above, this is mostly done.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: