Details

    • Type: New Feature
    • Status: Accepted
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Engine
    • Labels:
      None

      Description

      There is some excellent work by the MonetDB people on Runtime Optimization for XQuery (ROX) [1,2]. Query optimization is basically dynamic programming. ROX is basically incremental dynamic programming, which has been well studied in other areas. The advantage of ROX is that it is robust under a very wide variety of queries and does not requiring maintenance of additional information, such as chain histograms. Instead, good query plans are discovered by sampling permutations of the remaining joins based on the initial joins. The initial joins are chosen based on ground truth estimates of the cardinality of the access paths demanded by the various joins (that is, by index range count operations).

      Unlike MonetDB, bigdata using a pipelined join algorithm and is a parallel (aka distributed) database doing shard-wise processing. Therefore the implementation strategy would necessarily be somewhat different from ROX. However, since ROX appears to be relatively robust to front-bias in sampling, it should be simple enough to perform runtime query optimization once the initial intermediate solutions arrive at the first join with a dependency on upstream variable bindings. At that point, one or more node(s) with a partial result set for the query can sample various permutations of the remaining joins. This process proceeds incrementally, making a decision on the (n+1)th join once the samples clearly predict a minimum cost over the possible next join. Those estimates need to be based on the cost to a full solution (which could have some bias as indices warm up). Once the (n+1)th join is decided, it can be executed and the process repeats once initial solutions arrive at the downstream node(s). If more than one node does runtime optimization, then those nodes must coordinate in order to arrive at a decision on the minimum cost for the next join step. Because runtime optimization is integrated with the pipeline join and occurs while the previous join is still being executed, the latency should disappear entirely for all but the simplest of queries.

      [1] http://www.cwi.nl/htbin/ins1/publications?request=pdf&key=KaBoMaKe:SIGMOD:09
      [2] http://www-db.informatik.uni-tuebingen.de/files/research/pathfinder/publications/rox-demo.pdf

      Related tickets include:

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        I missed some tests where the SliceOp constructor was invoked in the last commit. This should bring the build back to normal.

        See BLZG-265 (RTO)
        See BLZG-879 (Solution order not always preserved)

        Committed revision r7788.

        Show
        bryanthompson bryanthompson added a comment - I missed some tests where the SliceOp constructor was invoked in the last commit. This should bring the build back to normal. See BLZG-265 (RTO) See BLZG-879 (Solution order not always preserved) Committed revision r7788.
        Hide
        bryanthompson bryanthompson added a comment -

        New issue
        - failure mode where we have cardinality estimate underflow. The code hits an assertion where npaths should be ONE (1) but it is GT ONE. This is tripped by govtrack query41.rq. We probably just need to take the first path with the smallest estimated cardinality or just a random path.

        Caused by: java.lang.AssertionError: Expected one path but have 2 paths.
        	at com.bigdata.bop.joinGraph.rto.JGraph.runtimeOptimizer(JGraph.java:574)
        	at com.bigdata.bop.joinGraph.rto.JoinGraph$JoinGraphTask.call(JoinGraph.java:471)
        	at com.bigdata.bop.joinGraph.rto.JoinGraph$JoinGraphTask.call(JoinGraph.java:432)
        	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        	at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1341)
        	... 9 more
        

        A related issue. The RTO should take a fast exit if any of the vertices are known to be empty. When this happens, the join graph will not produce ny solutions and could be replaced by an empty solution (maybe the DataSetJoin?). This would help us avoid some cases that can lead us into the cardinality underflow problem described immediately above.

        Show
        bryanthompson bryanthompson added a comment - New issue - failure mode where we have cardinality estimate underflow. The code hits an assertion where npaths should be ONE (1) but it is GT ONE. This is tripped by govtrack query41.rq. We probably just need to take the first path with the smallest estimated cardinality or just a random path. Caused by: java.lang.AssertionError: Expected one path but have 2 paths. at com.bigdata.bop.joinGraph.rto.JGraph.runtimeOptimizer(JGraph.java:574) at com.bigdata.bop.joinGraph.rto.JoinGraph$JoinGraphTask.call(JoinGraph.java:471) at com.bigdata.bop.joinGraph.rto.JoinGraph$JoinGraphTask.call(JoinGraph.java:432) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1341) ... 9 more A related issue. The RTO should take a fast exit if any of the vertices are known to be empty. When this happens, the join graph will not produce ny solutions and could be replaced by an empty solution (maybe the DataSetJoin?). This would help us avoid some cases that can lead us into the cardinality underflow problem described immediately above.
        Hide
        bryanthompson bryanthompson added a comment -

        Added logic to select the join path having the minimum sumEstCard if there is cardinality underflow for some of the join paths. This addresses a corner case where the RTO has multiple possible paths. If all paths have underflow, it accepts the first path.

        Note: This does not eliminate paths for the same sets of vertices where some (or all) paths have underflow during the expansion rounds. Therefore it might do too much work on such queries.

        Note: We should, perhaps, throw out a NoSolutionsException instead when all paths have underflow.

        Committed revision r7793.

        Show
        bryanthompson bryanthompson added a comment - Added logic to select the join path having the minimum sumEstCard if there is cardinality underflow for some of the join paths. This addresses a corner case where the RTO has multiple possible paths. If all paths have underflow, it accepts the first path. Note: This does not eliminate paths for the same sets of vertices where some (or all) paths have underflow during the expansion rounds. Therefore it might do too much work on such queries. Note: We should, perhaps, throw out a NoSolutionsException instead when all paths have underflow. Committed revision r7793.
        Hide
        bryanthompson bryanthompson added a comment -

        The commit above fixes govtrack query41.rq. Performance on that query appears to be very close to performance without the RTO.

        Show
        bryanthompson bryanthompson added a comment - The commit above fixes govtrack query41.rq. Performance on that query appears to be very close to performance without the RTO.
        Hide
        bryanthompson bryanthompson added a comment -

        Now running the govtrack benchmark with the RTO to get a sense of how much impact it has on query performance. Note that some of the govtrack queries disable the join optimizer and therefore will be unchanged when the default optimizer is set to Runtime rather than Static.

        We wind up with a net gain for many of the govtrack queries.

        Query is low on defaultGraphs12 and distinct01b.

        Query appears to hang on query0021e.rq.

        The root cause for defaultGraphs12 is that the predicates associated with the vertices for the join graph for default graph (and named graph) queries are not being turned correctly into access paths for sampling. I have taken some notes on the problem here. The basic issue is that AST2BOpJoins#join() is accepting a Predicate but may output a PipelineJoin with filters (e.g., for SCAN+FILTER) or a DataSetJoin + PipelineJoin (where the DataSetJoin pulls in the named or default graphs). For default graph APs, this code also imposes the DISTINCT SPO constraint. The code is also responsible for handling scale-out access paths.

        AST2BOPJoins#join() is really operating at two different levels for non-triples mode or non-local joins. In order to work with the RTO, we need to surface the actual APs (both the DataSetJoin and the SPO join) and FILTERs (DISTINCT SPO) into the join graph's vertices and constraints. We also need to get the local versus remote APs correct. And we need to be able to sample from the DataSetJoin (which is itself a materialized sample, so this is pretty easy.)

        Show
        bryanthompson bryanthompson added a comment - Now running the govtrack benchmark with the RTO to get a sense of how much impact it has on query performance. Note that some of the govtrack queries disable the join optimizer and therefore will be unchanged when the default optimizer is set to Runtime rather than Static. We wind up with a net gain for many of the govtrack queries. Query is low on defaultGraphs12 and distinct01b. Query appears to hang on query0021e.rq. The root cause for defaultGraphs12 is that the predicates associated with the vertices for the join graph for default graph (and named graph) queries are not being turned correctly into access paths for sampling. I have taken some notes on the problem here. The basic issue is that AST2BOpJoins#join() is accepting a Predicate but may output a PipelineJoin with filters (e.g., for SCAN+FILTER) or a DataSetJoin + PipelineJoin (where the DataSetJoin pulls in the named or default graphs). For default graph APs, this code also imposes the DISTINCT SPO constraint. The code is also responsible for handling scale-out access paths. AST2BOPJoins#join() is really operating at two different levels for non-triples mode or non-local joins. In order to work with the RTO, we need to surface the actual APs (both the DataSetJoin and the SPO join) and FILTERs (DISTINCT SPO) into the join graph's vertices and constraints. We also need to get the local versus remote APs correct. And we need to be able to sample from the DataSetJoin (which is itself a materialized sample, so this is pretty easy.)

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated: