Details

    • Type: New Feature
    • Status: Accepted
    • Resolution: Unresolved
    • Affects Version/s: BIGDATA_RELEASE_1_2_0
    • Fix Version/s: None
    • Component/s: Query Plan Generator
    • Labels:
      None

      Description

      Per [1], here are some interesting breakdowns of the total time:

      - 23% Parsing SPARQL queries
      - 17% Executing SPARQL queries (optimization plus evaluation)
      - 14% Query optimization (it looks like we spend MUCH more time optimizing queries than we do evaluating them).
      

      It looks like there is a lot of fat in the query optimization phase and perhaps in the query parser (which is known to drive the heap heavily).

      A big driver of the HEAP is the iterator() methods. That is coming out of AbstractList.listIterator(), Collections$SynchronizedCollection.iterator(), AbstractSequentialList.iterator(), and ModifiableBOpBase$NotifyingList.iterator().

      "Name","Objects","Size","Level"
      "java.util.AbstractList.listIterator()","12230","390928", "1"
      "java.util.Collections$SynchronizedCollection.iterator()","8016","256512", "2"
      "java.util.AbstractSequentialList.iterator()","2771","88672", "2"
      "com.bigdata.bop.ModifiableBOpBase$NotifyingList.iterator()","1321","42272", "2"
      

      Some of the big drivers for those iterator methods are:

      - com.bigdata.bop.BOpUtility.preOrderIterator2(int, BOp)
      - com.bigdata.bop.BOpUtility.annotationOpIterator(BOp)
      

      Striterator.addFilter() shows up BIG with back traces through BOpUtility$Expand, which is part of the same iteration pattern.

      This suggests that we could win big if we could improve our iteration patterns over the AST.

      [1] https://sourceforge.net/apps/trac/bigdata/ticket/546 (Index cache for Journal)

        Issue Links

          Activity

          beebs Brad Bebee created issue -
          Hide
          bryanthompson bryanthompson added a comment -

          BOpBase: These method should only make a copy of the args or the anns (depending on which one is to be modified). There is no reason to make a copy of the other. This is all within the "effectively" immutable contract. Predicate special cases this for asBound(), but we could also handle this in BOpBase.

              - BOpBase setArg(final int index, final BOp newArg)
              - BOpBase setProperty(final String name, final Object value)
              - BOpBase setUnboundProperty(final String name, final Object value)
          

          Note that BOpBase is for the IMMUTABLE bops (pipeline bops, function bops, etc).

          ModifiableBOpBase is for the MUTABLE bops (AST nodes).

          Show
          bryanthompson bryanthompson added a comment - BOpBase: These method should only make a copy of the args or the anns (depending on which one is to be modified). There is no reason to make a copy of the other. This is all within the "effectively" immutable contract. Predicate special cases this for asBound(), but we could also handle this in BOpBase. - BOpBase setArg(final int index, final BOp newArg) - BOpBase setProperty(final String name, final Object value) - BOpBase setUnboundProperty(final String name, final Object value) Note that BOpBase is for the IMMUTABLE bops (pipeline bops, function bops, etc). ModifiableBOpBase is for the MUTABLE bops (AST nodes).
          Hide
          bryanthompson bryanthompson added a comment -

          Ok. I think I see how I have misinterpreted the results.

          17% Query optimization (solutions result)
                  com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.evaluateTupleQuery(AbstractTripleStore, ASTContainer, QueryBindingSet)
          
           3% Query optimization (graph result)
              com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.evaluateQuery(ASTContainer, AST2BOpContext, IBindingSet[], boolean, IVariable[])
          
           1% just submits the query, but does not wait for it to complete
              com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.evaluateGraphQuery(AbstractTripleStore, ASTContainer, QueryBindingSet)
          
          23% SPARQL query parser
              com.bigdata.rdf.sail.sparql.Bigdata2ASTSPARQLParser.parseQuery2(String, String)
          
          38% Total time waiting on pipeline operator evaluation
              com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run()
          
          So: 17+3 = 20% query optimization (ignore the 1%, which is already included).
              20+20 = 40% query parser + query optimization
              40+38 = 78% query parser, query optimizer, query evaluation
          
          Leaving 22% for things like the sparql end point protocol.
          

          This shows where the time goes during query optimization. Most of the time in the static join optimizer is actually index access time for the range counts.

          Name Time (ms) Own Time (ms) Level 
          com.bigdata.rdf.sparql.ast.eval.AST2BOpUtility.convert(AST2BOpContext, IBindingSet[]) 173140 30  "1" 
          com.bigdata.rdf.sparql.ast.optimizers.ASTOptimizerList.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 164440 20  "2" 
          com.bigdata.rdf.sparql.ast.optimizers.ASTStaticJoinOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 82510 0  "3" 
          com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 24310 0  "3" 
          com.bigdata.rdf.sparql.ast.optimizers.ASTBottomUpOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 17350 0  "3" 
          

          This shows where the time goes during query evaluation. The table is sorted by own time, which is the last column. I think that there must be some sampling artifacts here as I can not otherwise explain why we would see so much time in some of these methods. Most of the time is lost to FileChannelUtility. Maybe we could start managing our own buffers using the MemoryManager? The next culprit is value expression evaluation. Then B+Tree node search (which might be helped by a finger).

          sun.nio.ch.FileChannelImpl.read(ByteBuffer, long) 155550 155550 
          com.bigdata.rdf.internal.constraints.SPARQLConstraint.accept(IBindingSet) 64460 27670 
          java.lang.reflect.Array.newInstance(Class, int) 26300 26300 
          java.util.concurrent.locks.ReentrantLock.unlock() 25580 25580 
          com.bigdata.rdf.internal.constraints.IVValueExpression.getAndCheckLiteral(int, IBindingSet) 26280 25420 
          javax.xml.datatype.DatatypeFactory.newInstance() 18850 18850 
          com.bigdata.rdf.internal.constraints.TryBeforeMaterializationConstraint.accept(IBindingSet) 55870 10510 
          com.bigdata.btree.Node.indexOf(byte[]) 181440 10060 
          com.bigdata.cache.RingBuffer.add(Object) 10160 9440 
          it.unimi.dsi.fastutil.bytes.custom.CustomByteArrayFrontCodedList.binarySearch(byte[]) 9180 9180 
          cutthecrap.utils.striterators.SingleValueIterator.hasNext() 9100 9100 
          java.util.AbstractQueue.add(Object) 9000 9000 
          com.bigdata.rdf.internal.constraints.NeedsMaterializationBOp.accept(IBindingSet) 19030 8240 
          com.bigdata.btree.Leaf.indexOf(byte[]) 16180 7370 
          com.bigdata.journal.Journal.getReadExecutor() 6810 6810 
          java.util.UUID.randomUUID() 6660 6660 
          java.util.concurrent.ConcurrentHashMap.put(Object, Object) 6290 6290 
          java.util.concurrent.ThreadPoolExecutor.execute(Runnable) 5550 5550 
          it.unimi.dsi.fastutil.bytes.custom.CustomByteArrayFrontCodedList.search(byte[]) 12800 3620 
          
          Show
          bryanthompson bryanthompson added a comment - Ok. I think I see how I have misinterpreted the results. 17% Query optimization (solutions result) com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.evaluateTupleQuery(AbstractTripleStore, ASTContainer, QueryBindingSet) 3% Query optimization (graph result) com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.evaluateQuery(ASTContainer, AST2BOpContext, IBindingSet[], boolean, IVariable[]) 1% just submits the query, but does not wait for it to complete com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.evaluateGraphQuery(AbstractTripleStore, ASTContainer, QueryBindingSet) 23% SPARQL query parser com.bigdata.rdf.sail.sparql.Bigdata2ASTSPARQLParser.parseQuery2(String, String) 38% Total time waiting on pipeline operator evaluation com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run() So: 17+3 = 20% query optimization (ignore the 1%, which is already included). 20+20 = 40% query parser + query optimization 40+38 = 78% query parser, query optimizer, query evaluation Leaving 22% for things like the sparql end point protocol. This shows where the time goes during query optimization. Most of the time in the static join optimizer is actually index access time for the range counts. Name Time (ms) Own Time (ms) Level com.bigdata.rdf.sparql.ast.eval.AST2BOpUtility.convert(AST2BOpContext, IBindingSet[]) 173140 30 "1" com.bigdata.rdf.sparql.ast.optimizers.ASTOptimizerList.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 164440 20 "2" com.bigdata.rdf.sparql.ast.optimizers.ASTStaticJoinOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 82510 0 "3" com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 24310 0 "3" com.bigdata.rdf.sparql.ast.optimizers.ASTBottomUpOptimizer.optimize(AST2BOpContext, IQueryNode, IBindingSet[]) 17350 0 "3" This shows where the time goes during query evaluation. The table is sorted by own time, which is the last column. I think that there must be some sampling artifacts here as I can not otherwise explain why we would see so much time in some of these methods. Most of the time is lost to FileChannelUtility. Maybe we could start managing our own buffers using the MemoryManager? The next culprit is value expression evaluation. Then B+Tree node search (which might be helped by a finger). sun.nio.ch.FileChannelImpl.read(ByteBuffer, long) 155550 155550 com.bigdata.rdf.internal.constraints.SPARQLConstraint.accept(IBindingSet) 64460 27670 java.lang.reflect.Array.newInstance(Class, int) 26300 26300 java.util.concurrent.locks.ReentrantLock.unlock() 25580 25580 com.bigdata.rdf.internal.constraints.IVValueExpression.getAndCheckLiteral(int, IBindingSet) 26280 25420 javax.xml.datatype.DatatypeFactory.newInstance() 18850 18850 com.bigdata.rdf.internal.constraints.TryBeforeMaterializationConstraint.accept(IBindingSet) 55870 10510 com.bigdata.btree.Node.indexOf(byte[]) 181440 10060 com.bigdata.cache.RingBuffer.add(Object) 10160 9440 it.unimi.dsi.fastutil.bytes.custom.CustomByteArrayFrontCodedList.binarySearch(byte[]) 9180 9180 cutthecrap.utils.striterators.SingleValueIterator.hasNext() 9100 9100 java.util.AbstractQueue.add(Object) 9000 9000 com.bigdata.rdf.internal.constraints.NeedsMaterializationBOp.accept(IBindingSet) 19030 8240 com.bigdata.btree.Leaf.indexOf(byte[]) 16180 7370 com.bigdata.journal.Journal.getReadExecutor() 6810 6810 java.util.UUID.randomUUID() 6660 6660 java.util.concurrent.ConcurrentHashMap.put(Object, Object) 6290 6290 java.util.concurrent.ThreadPoolExecutor.execute(Runnable) 5550 5550 it.unimi.dsi.fastutil.bytes.custom.CustomByteArrayFrontCodedList.search(byte[]) 12800 3620
          Hide
          bryanthompson bryanthompson added a comment -

          mrpersonick wrote: added an abstract optimizer class that does the work of iterating the AST and finding the join groups to optimize.

          Committed Revision r6283.

          Show
          bryanthompson bryanthompson added a comment - mrpersonick wrote: added an abstract optimizer class that does the work of iterating the AST and finding the join groups to optimize. Committed Revision r6283.
          bryanthompson bryanthompson made changes -
          Field Original Value New Value
          Assignee mikepersonick [ mikepersonick ] michaelschmidt [ michaelschmidt ]
          Hide
          bryanthompson bryanthompson added a comment -

          Assigned to Michael since he is currently looking at where the time goes.

          Show
          bryanthompson bryanthompson added a comment - Assigned to Michael since he is currently looking at where the time goes.
          beebs Brad Bebee made changes -
          Workflow Trac Import v2 [ 12489 ] Trac Import v3 [ 13208 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v3 [ 13208 ] Trac Import v4 [ 14537 ]
          bryanthompson bryanthompson made changes -
          Link This issue relates to BLZG-1351 [ BLZG-1351 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v4 [ 14537 ] Trac Import v5 [ 15888 ]
          Hide
          michaelschmidt michaelschmidt added a comment -
          Show
          michaelschmidt michaelschmidt added a comment - See the statistics in https://docs.google.com/spreadsheets/d/1m8EXUc5Fo7WpLCb9Rat6BYo-lztsqGvARwpLbxH4yho/edit#gid=0 for detailed information.
          michaelschmidt michaelschmidt made changes -
          Status Open [ 1 ] Accepted [ 10101 ]
          Hide
          michaelschmidt michaelschmidt added a comment -

          Here are some ideas for future work in this respect:

          • Globally log statistics for all queries and provide average values at any point -> could be useful to better understand benchmark results
          • Look at slow optimizers and try to accelerate (e.g., disable early if not applicable, such as the query hint optimizer)
          • Fix elapsed time for range count measurement -> this is a parallel call in ASTRangeCountOptimizer
          Show
          michaelschmidt michaelschmidt added a comment - Here are some ideas for future work in this respect: Globally log statistics for all queries and provide average values at any point -> could be useful to better understand benchmark results Look at slow optimizers and try to accelerate (e.g., disable early if not applicable, such as the query hint optimizer) Fix elapsed time for range count measurement -> this is a parallel call in ASTRangeCountOptimizer
          beebs Brad Bebee made changes -
          Workflow Trac Import v5 [ 15888 ] Trac Import v6 [ 18411 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v6 [ 18411 ] Trac Import v7 [ 19830 ]
          beebs Brad Bebee made changes -
          Workflow Trac Import v7 [ 19830 ] Trac Import v8 [ 21520 ]

            People

            • Assignee:
              michaelschmidt michaelschmidt
              Reporter:
              bryanthompson bryanthompson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: