Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1297

Query optimizer slows down a query significantly

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Wikidata Query Service
    • Labels:
      None

      Description

      During the beta tests on Wikidata, we've discovered the following query performs poorly:

      PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
      PREFIX wikibase: <http://wikiba.se/ontology#>
      PREFIX hint: <http://www.bigdata.com/queryHints#>
      
      SELECT DISTINCT ?result WHERE {
      	{ 
      		{ ?subject0 rdfs:label "United States"@en . } UNION { ?subject0 skos:altLabel "United States"@en . }
      	}
      	{
      		{ ?predicate1 rdfs:label "president"@en . } UNION { ?predicate1 skos:altLabel "president"@en . }
      	}
      	?predicate1 a wikibase:Property .
      	?predicate1 wikibase:directClaim ?directPredicate2 .
      	?subject0 ?directPredicate2 ?result .
      }
      

      However, without the optimizer the same query runs in 300 ms:

      PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
      PREFIX wikibase: <http://wikiba.se/ontology#>
      PREFIX hint: <http://www.bigdata.com/queryHints#>
      
      SELECT DISTINCT ?result WHERE {
        	hint:Query hint:optimizer "None" .
      
      	{ 
      		{ ?subject0 rdfs:label "United States"@en . } UNION { ?subject0 skos:altLabel "United States"@en . }
      	}
      	{
      		{ ?predicate1 rdfs:label "president"@en . } UNION { ?predicate1 skos:altLabel "president"@en . }
      	}
      	?predicate1 a wikibase:Property .
      	?predicate1 wikibase:directClaim ?directPredicate2 .
      	?subject0 ?directPredicate2 ?result .
      }
      

      See also https://phabricator.wikimedia.org/T100235

      The query plan by default:

      com.bigdata.bop.solutions.ProjectionOp[34](HTreeDistinctBindingSetsOp[32])[ BOp.bopId=34, BOp.evaluationContext=CONTROLLER, PipelineOp.sharedState=true, JoinAnnotations.select=[result], BOp.timeout=30000, QueryEngine.queryId=cc2246a3-a37f-4397-a731-38b5cf6d0bf9]
        com.bigdata.bop.solutions.HTreeDistinctBindingSetsOp[32](CopyOp[21])[ BOp.bopId=32, HashJoinAnnotations.joinVars=[result], BOp.evaluationContext=CONTROLLER, namedSetRef=NamedSolutionSetRef{localName=--distinct-33,queryId=cc2246a3-a37f-4397-a731-38b5cf6d0bf9,joinVars=[result]}, PipelineOp.sharedState=true, PipelineOp.maxParallel=1]
          com.bigdata.bop.bset.CopyOp[21](PipelineJoin[31])[ BOp.bopId=21, BOp.evaluationContext=CONTROLLER]
            com.bigdata.bop.join.PipelineJoin[31](CopyOp[23])[ BOp.bopId=31, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[29](predicate1=null, Vocab(-69)[http://www.w3.org/2004/02/skos/core#altLabel], TermId(21493L)[president], --anon-30=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575799791, BOp.bopId=29, AST2BOpBase.estimatedCardinality=4, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=21]
              com.bigdata.bop.bset.CopyOp[23](PipelineJoin[28])[ BOp.bopId=23]
                com.bigdata.bop.join.PipelineJoin[28](CopyOp[22])[ BOp.bopId=28, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[26](predicate1=null, Vocab(68)[http://www.w3.org/2000/01/rdf-schema#label], TermId(21493L)[president], --anon-27=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575799791, BOp.bopId=26, AST2BOpBase.estimatedCardinality=2, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=21]
                  com.bigdata.bop.bset.CopyOp[22](Tee[24])[ BOp.bopId=22]
                    com.bigdata.bop.bset.Tee[24](CopyOp[10])[ BOp.bopId=24, PipelineOp.sinkRef=22, PipelineOp.altSinkRef=23]
                      com.bigdata.bop.bset.CopyOp[10](PipelineJoin[20])[ BOp.bopId=10, BOp.evaluationContext=CONTROLLER]
                        com.bigdata.bop.join.PipelineJoin[20](CopyOp[12])[ BOp.bopId=20, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[18](subject0=null, Vocab(-69)[http://www.w3.org/2004/02/skos/core#altLabel], TermId(17806558L)[United States], --anon-19=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575799791, BOp.bopId=18, AST2BOpBase.estimatedCardinality=1, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=10]
                          com.bigdata.bop.bset.CopyOp[12](PipelineJoin[17])[ BOp.bopId=12]
                            com.bigdata.bop.join.PipelineJoin[17](CopyOp[11])[ BOp.bopId=17, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[15](subject0=null, Vocab(68)[http://www.w3.org/2000/01/rdf-schema#label], TermId(17806558L)[United States], --anon-16=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575799791, BOp.bopId=15, AST2BOpBase.estimatedCardinality=3, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=10]
                              com.bigdata.bop.bset.CopyOp[11](Tee[13])[ BOp.bopId=11]
                                com.bigdata.bop.bset.Tee[13](PipelineJoin[9])[ BOp.bopId=13, PipelineOp.sinkRef=11, PipelineOp.altSinkRef=12]
                                  com.bigdata.bop.join.PipelineJoin[9](PipelineJoin[6])[ BOp.bopId=9, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[7](subject0=null, directPredicate2=null, result=null, --anon-8=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575799791, BOp.bopId=7, AST2BOpBase.estimatedCardinality=600539593, AST2BOpBase.originalIndex=SPO, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]
                                    com.bigdata.bop.join.PipelineJoin[6](PipelineJoin[3])[ BOp.bopId=6, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[4](predicate1=null, TermId(529U)[http://wikiba.se/ontology#directClaim], directPredicate2=null, --anon-5=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575799791, BOp.bopId=4, AST2BOpBase.estimatedCardinality=1543, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]
                                      com.bigdata.bop.join.PipelineJoin[3]()[ BOp.bopId=3, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[1](predicate1=null, Vocab(57)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], TermId(532U)[http://wikiba.se/ontology#Property], --anon-2=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575799791, BOp.bopId=1, AST2BOpBase.estimatedCardinality=1543, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]
      

      with None optimizer hint:

        com.bigdata.bop.solutions.HTreeDistinctBindingSetsOp[32](PipelineJoin[31])[ BOp.bopId=32, HashJoinAnnotations.joinVars=[result], BOp.evaluationContext=CONTROLLER, namedSetRef=NamedSolutionSetRef{localName=--distinct-33,queryId=d0d15a14-c4a4-433f-9282-0be160314640,joinVars=[result]}, PipelineOp.sharedState=true, PipelineOp.maxParallel=1]
          com.bigdata.bop.join.PipelineJoin[31](PipelineJoin[28])[ BOp.bopId=31, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[29](subject0=null, directPredicate2=null, result=null, --anon-30=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575927085, BOp.bopId=29, AST2BOpBase.estimatedCardinality=600539593, AST2BOpBase.originalIndex=SPO, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]
            com.bigdata.bop.join.PipelineJoin[28](PipelineJoin[25])[ BOp.bopId=28, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[26](predicate1=null, TermId(529U)[http://wikiba.se/ontology#directClaim], directPredicate2=null, --anon-27=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575927085, BOp.bopId=26, AST2BOpBase.estimatedCardinality=1543, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]
              com.bigdata.bop.join.PipelineJoin[25](CopyOp[12])[ BOp.bopId=25, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[23](predicate1=null, Vocab(57)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type], TermId(532U)[http://wikiba.se/ontology#Property], --anon-24=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575927085, BOp.bopId=23, AST2BOpBase.estimatedCardinality=1543, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]]]
                com.bigdata.bop.bset.CopyOp[12](PipelineJoin[22])[ BOp.bopId=12, BOp.evaluationContext=CONTROLLER]
                  com.bigdata.bop.join.PipelineJoin[22](CopyOp[14])[ BOp.bopId=22, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[20](predicate1=null, Vocab(-69)[http://www.w3.org/2004/02/skos/core#altLabel], TermId(21493L)[president], --anon-21=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575927085, BOp.bopId=20, AST2BOpBase.estimatedCardinality=4, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=12]
                    com.bigdata.bop.bset.CopyOp[14](PipelineJoin[19])[ BOp.bopId=14]
                      com.bigdata.bop.join.PipelineJoin[19](CopyOp[13])[ BOp.bopId=19, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[17](predicate1=null, Vocab(68)[http://www.w3.org/2000/01/rdf-schema#label], TermId(21493L)[president], --anon-18=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575927085, BOp.bopId=17, AST2BOpBase.estimatedCardinality=2, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=12]
                        com.bigdata.bop.bset.CopyOp[13](Tee[15])[ BOp.bopId=13]
                          com.bigdata.bop.bset.Tee[15](CopyOp[1])[ BOp.bopId=15, PipelineOp.sinkRef=13, PipelineOp.altSinkRef=14]
                            com.bigdata.bop.bset.CopyOp[1](PipelineJoin[11])[ BOp.bopId=1, BOp.evaluationContext=CONTROLLER]
                              com.bigdata.bop.join.PipelineJoin[11](CopyOp[3])[ BOp.bopId=11, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[9](subject0=null, Vocab(-69)[http://www.w3.org/2004/02/skos/core#altLabel], TermId(17806558L)[United States], --anon-10=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575927085, BOp.bopId=9, AST2BOpBase.estimatedCardinality=1, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=1]
                                com.bigdata.bop.bset.CopyOp[3](PipelineJoin[8])[ BOp.bopId=3]
                                  com.bigdata.bop.join.PipelineJoin[8](CopyOp[2])[ BOp.bopId=8, JoinAnnotations.constraints=null, AST2BOpBase.simpleJoin=true, BOp.evaluationContext=ANY, AccessPathJoinAnnotations.predicate=com.bigdata.rdf.spo.SPOPredicate[6](subject0=null, Vocab(68)[http://www.w3.org/2000/01/rdf-schema#label], TermId(17806558L)[United States], --anon-7=null)[ IPredicate.relationName=[wdq.spo], IPredicate.timestamp=1432575927085, BOp.bopId=6, AST2BOpBase.estimatedCardinality=3, AST2BOpBase.originalIndex=POS, IPredicate.flags=[KEYS,VALS,READONLY,PARALLEL]], PipelineOp.sinkRef=1]
                                    com.bigdata.bop.bset.CopyOp[2](Tee[4])[ BOp.bopId=2]
                                      com.bigdata.bop.bset.Tee[4]()[ BOp.bopId=4, PipelineOp.sinkRef=2, PipelineOp.altSinkRef=3]
      

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Please document the data set (e.g., wikidata current up to timestamp) and provide a link to a procedure to re-create the data set locally (e.g., service install procedure). We might not need it for this query, but we will need it in general to support the wikidata query service.

        Thanks,
        Bryan

        Show
        bryanthompson bryanthompson added a comment - Please document the data set (e.g., wikidata current up to timestamp) and provide a link to a procedure to re-create the data set locally (e.g., service install procedure). We might not need it for this query, but we will need it in general to support the wikidata query service. Thanks, Bryan
        Hide
        stasmalyshev stasmalyshev added a comment - - edited

        The install procedure is here: https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md

        Note that dataset is big, it may take a couple of days to duplicate it, but it should be possible. We also have public beta endpoint now (http://wdqs-beta.wmflabs.org/), so you could access that too (and we can arrange access to that machine via labs, as before). The data is updated up to 2015-05-24T03:23:35 . The dump on which the error happens was imported with all language labels (not english only).

        I've checked on another database, with only English labels, the result is the same, by default query takes long, with optimizer as None returns instantly and with correct result.

        Show
        stasmalyshev stasmalyshev added a comment - - edited The install procedure is here: https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md Note that dataset is big , it may take a couple of days to duplicate it, but it should be possible. We also have public beta endpoint now ( http://wdqs-beta.wmflabs.org/ ), so you could access that too (and we can arrange access to that machine via labs, as before). The data is updated up to 2015-05-24T03:23:35 . The dump on which the error happens was imported with all language labels (not english only). I've checked on another database, with only English labels, the result is the same, by default query takes long, with optimizer as None returns instantly and with correct result.
        Hide
        bryanthompson bryanthompson added a comment - - edited

        Looking at the query plans, it looks like the problem is in the ASTJoinOrderOptimizer: for the query above, the selective patterns are encapsulated in the UNION blocks, and when optimizing the join order optimizer doesn’t consider those patterns. Generally speaking, this could be tackled as part of the join order optimization refactoring planned for the next sprint, but I’m not sure how trivial it is to do that in a proper (generalized) way. See also BLZG-1315.

        It seems that recent optimizations related to static reordering in simple UNIONS do not apply -> Mikepersonick wrote:

        Have they tried running it without those extra join groups? The join order optimizer should be able to deal with simple unions, but I'm pretty sure it can't reorder join groups themselves (even if they are only composed of a single statement pattern or union). The quick and dirty solution might be to fix the ASTFlattenJoinGroupsOptimizer to get rid of those unnecessary join groups.

        Show
        bryanthompson bryanthompson added a comment - - edited Looking at the query plans, it looks like the problem is in the ASTJoinOrderOptimizer: for the query above, the selective patterns are encapsulated in the UNION blocks, and when optimizing the join order optimizer doesn’t consider those patterns. Generally speaking, this could be tackled as part of the join order optimization refactoring planned for the next sprint, but I’m not sure how trivial it is to do that in a proper (generalized) way. See also BLZG-1315 . It seems that recent optimizations related to static reordering in simple UNIONS do not apply -> Mikepersonick wrote: Have they tried running it without those extra join groups? The join order optimizer should be able to deal with simple unions, but I'm pretty sure it can't reorder join groups themselves (even if they are only composed of a single statement pattern or union). The quick and dirty solution might be to fix the ASTFlattenJoinGroupsOptimizer to get rid of those unnecessary join groups.
        Hide
        stasmalyshev stasmalyshev added a comment - - edited

        Interestingly enough, if I rewrite is as:

        SELECT DISTINCT ?result WHERE {
          	hint:Query hint:optimizer "None" .
        
        	?subject0 rdfs:label|skos:altLabel "United States"@en . 
          	?predicate1 rdfs:label|skos:altLabel  "president"@en .
        
            ?predicate1 a wikibase:Property .
        	?predicate1 wikibase:directClaim ?directPredicate2 .
        	?subject0 ?directPredicate2 ?result .
        }
        

        it is slow both with and without hint:optimizer.

        -> Note michaelschmidt:
        Actually the query above is quite different from an optimization point of view, opened a dedicated ticket for this one, see BLZG-1319.

        Show
        stasmalyshev stasmalyshev added a comment - - edited Interestingly enough, if I rewrite is as: SELECT DISTINCT ?result WHERE { hint:Query hint:optimizer "None" . ?subject0 rdfs:label|skos:altLabel "United States"@en . ?predicate1 rdfs:label|skos:altLabel "president"@en . ?predicate1 a wikibase:Property . ?predicate1 wikibase:directClaim ?directPredicate2 . ?subject0 ?directPredicate2 ?result . } it is slow both with and without hint:optimizer. -> Note michaelschmidt: Actually the query above is quite different from an optimization point of view, opened a dedicated ticket for this one, see BLZG-1319 .
        Hide
        Jheald James Heald added a comment -

        The query in the original report now appears to work, on the current Wikidata set-up: http://tinyurl.com/ob527ow

        However, the optimizer still appears to have difficulty with choosing between different UNION blocks if one of them contains BIND statements. – BLZG-1541 created for this case.

        Show
        Jheald James Heald added a comment - The query in the original report now appears to work, on the current Wikidata set-up: http://tinyurl.com/ob527ow However, the optimizer still appears to have difficulty with choosing between different UNION blocks if one of them contains BIND statements. – BLZG-1541 created for this case.
        Hide
        michaelschmidt michaelschmidt added a comment -

        OK, I'll have a look at BLZG-1541 soon. Can this one be closed then?

        Show
        michaelschmidt michaelschmidt added a comment - OK, I'll have a look at BLZG-1541 soon. Can this one be closed then?
        Hide
        stasmalyshev stasmalyshev added a comment -

        This one seems to be gone in 2.0.1.

        Show
        stasmalyshev stasmalyshev added a comment - This one seems to be gone in 2.0.1.

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            stasmalyshev stasmalyshev
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: