Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-776

Optimization with skos altLabel

    XMLWordPrintable

    Details

      Description

      (See note about preferred approach to fixing this bug at end)

      In my data I have multiple concept schemes and wish to do prefix mapping against one scheme at a time. Typical query is:

      prefix skos: <http://www.w3.org/2004/02/skos/core#>    
      prefix bds: <http://www.bigdata.com/rdf/search#>
         
      select distinct ?o
      where {     
         ?o bds:search "viscu*" .
          ?s skos:inScheme <http://syapse.com/vocabularies/fma/anatomical_entity#> .
          ?s skos:prefLabel|skos:altLabel ?o. 
      }
      

      This query performs badly (elapsed=21478ms.), and the equivalent query below, performs much better (elapsed=30ms.):

      prefix skos: <http://www.w3.org/2004/02/skos/core#>   
      prefix bds: <http://www.bigdata.com/rdf/search#> 
         
      select distinct ?o
      where {     
          {
             ?s skos:prefLabel ?o .
             ?s skos:inScheme <http://syapse.com/vocabularies/fma/anatomical_entity#> .
          }
          UNION {
             ?s skos:altLabel ?o.     
              ?s skos:inScheme <http://syapse.com/vocabularies/fma/anatomical_entity#> .
          }
         ?o bds:search "viscu*"
      }
      
      

      This is hence an optimizer issue.

      I am exploring how to fix this, and have a preference to take some time myself to try and find and submit a fix, but thought I should do so within the project, rather than privately.

      Currently, I note that the performance difference is explained by the Optimized AST
      - the good one:

      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
      PREFIX bds: <http://www.bigdata.com/rdf/search#>
      WITH {
        QueryType: SELECT
        SELECT VarNode(o)
          JoinGroupNode {
            SERVICE <ConstantNode(TermId(0U)[http://www.bigdata.com/rdf/search#search])> {
              JoinGroupNode {
                StatementPatternNode(VarNode(o), ConstantNode(TermId(0U)[http://www.bigdata.com/rdf/search#search]), ConstantNode(TermId(0L)[viscu*])) [scope=DEFAULT_CONTEXTS]
              }
            }
          }
      } AS %-anon-service-call-0 JOIN ON () DEPENDS ON ()
      QueryType: SELECT
      includeInferred=true
      SELECT DISTINCT ( VarNode(o) AS VarNode(o) )
        JoinGroupNode {
          INCLUDE %-anon-service-call-0 JOIN ON ()
          UnionNode [joinVars=[Lcom.bigdata.bop.IVariable;@535f7993] [projectInVars=[Lcom.bigdata.bop.IVariable;@2126cca8] {
            JoinGroupNode [joinVars=[Lcom.bigdata.bop.IVariable;@7a463a98] [projectInVars=[Lcom.bigdata.bop.IVariable;@18b10fdf] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-90)[http://www.w3.org/2004/02/skos/core#prefLabel]), VarNode(o)) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=960191
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-99)[http://www.w3.org/2004/02/skos/core#inScheme]), ConstantNode(TermId(2457U)[http://syapse.com/vocabularies/fma/anatomical_entity#])) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=81053
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
            } JOIN ON (o)
            JoinGroupNode [joinVars=[Lcom.bigdata.bop.IVariable;@6fd1826a] [projectInVars=[Lcom.bigdata.bop.IVariable;@23c93680] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-112)[http://www.w3.org/2004/02/skos/core#altLabel]), VarNode(o)) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=615502
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-99)[http://www.w3.org/2004/02/skos/core#inScheme]), ConstantNode(TermId(2457U)[http://syapse.com/vocabularies/fma/anatomical_entity#])) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=81053
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
            } JOIN ON (o)
          } JOIN ON (o)
        }
      
      

      joins on the ?o before finding ?s's in the concept scheme

      whereas the bad one, does a crossproduct between the literals returned by the lucene index and the ?s's in the concept scheme:

      PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
      PREFIX bds: <http://www.bigdata.com/rdf/search#>
      WITH {
        QueryType: SELECT
        SELECT VarNode(o)
          JoinGroupNode {
            SERVICE <ConstantNode(TermId(0U)[http://www.bigdata.com/rdf/search#search])> {
              JoinGroupNode {
                StatementPatternNode(VarNode(o), ConstantNode(TermId(0U)[http://www.bigdata.com/rdf/search#search]), ConstantNode(TermId(0L)[viscu*])) [scope=DEFAULT_CONTEXTS]
              }
            }
          }
      } AS %-anon-service-call-0 JOIN ON () DEPENDS ON ()
      QueryType: SELECT
      includeInferred=true
      SELECT DISTINCT ( VarNode(o) AS VarNode(o) )
        JoinGroupNode {
          INCLUDE %-anon-service-call-0 JOIN ON ()
          StatementPatternNode(VarNode(s), ConstantNode(Vocab(-99)[http://www.w3.org/2004/02/skos/core#inScheme]), ConstantNode(TermId(2457U)[http://syapse.com/vocabularies/fma/anatomical_entity#])) [scope=DEFAULT_CONTEXTS]
            com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=81053
            com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
          PropertyPathUnionNode [joinVars=[Lcom.bigdata.bop.IVariable;@790c1019] [projectInVars=[Lcom.bigdata.bop.IVariable;@6f51765] {
            JoinGroupNode [joinVars=[Lcom.bigdata.bop.IVariable;@44d2eb74] [projectInVars=[Lcom.bigdata.bop.IVariable;@73602ff8] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-90)[http://www.w3.org/2004/02/skos/core#prefLabel]), VarNode(o)) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=960191
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
            } JOIN ON (s,o)
            JoinGroupNode [joinVars=[Lcom.bigdata.bop.IVariable;@2fed7df5] [projectInVars=[Lcom.bigdata.bop.IVariable;@10e87868] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-112)[http://www.w3.org/2004/02/skos/core#altLabel]), VarNode(o)) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=615502
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
            } JOIN ON (s,o)
          } JOIN ON (s,o)
        }
      
      

      My current plan of attack is to identify which of the optimizers decides to evaluate:

      StatementPatternNode(VarNode(s), ConstantNode(Vocab(-90)[http://www.w3.org/2004/02/skos/core#prefLabel]), VarNode(o)) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=960191
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
      

      before

             StatementPatternNode(VarNode(s), ConstantNode(Vocab(-99)[http://www.w3.org/2004/02/skos/core#inScheme]), ConstantNode(TermId(2457U)[http://syapse.com/vocabularies/fma/anatomical_entity#])) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=81053
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
      

      in the first case, and to see what it would take to modify it to also decide to evaluate the latter expression also after:

         PropertyPathUnionNode [joinVars=[Lcom.bigdata.bop.IVariable;@790c1019] [projectInVars=[Lcom.bigdata.bop.IVariable;@6f51765] {
            JoinGroupNode [joinVars=[Lcom.bigdata.bop.IVariable;@44d2eb74] [projectInVars=[Lcom.bigdata.bop.IVariable;@73602ff8] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-90)[http://www.w3.org/2004/02/skos/core#prefLabel]), VarNode(o)) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=960191
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
            } JOIN ON (s,o)
            JoinGroupNode [joinVars=[Lcom.bigdata.bop.IVariable;@2fed7df5] [projectInVars=[Lcom.bigdata.bop.IVariable;@10e87868] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-112)[http://www.w3.org/2004/02/skos/core#altLabel]), VarNode(o)) [scope=DEFAULT_CONTEXTS]
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=615502
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POCS
            } JOIN ON (s,o)
          } JOIN ON (s,o)
      
      

      Pointers or comments may be helpful

        Attachments

          Activity

            People

            Assignee:
            jeremycarroll jeremycarroll
            Reporter:
            jeremycarroll jeremycarroll
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: