Details

      Description

      The service below fails, it seems to be to do with the data massage in the BIND command just above.
      We are using SPARQL-END-POINT being a d2r server.

      base <http://localhost:8000/>
      prefix owl: <http://www.w3.org/2002/07/owl#>
      prefix based: <http://localhost:8000/bdm/api/appindividual/based:>
      prefix xsd: <http://www.w3.org/2001/XMLSchema#>
      prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
      prefix rep: <http://localhost:8000/bdm/api/kbobject/rep:>
      prefix dc: <http://purl.org/dc/elements/1.1/>
      prefix sys: <http://localhost:8000/bdm/api/kbobject/sys:>
      prefix base: <http://localhost:8000/bdm/api/kbobject/base:>
      prefix s: <http://localhost:8000/bdm/api/>
      prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
      prefix bds: <http://www.bigdata.com/rdf/search#>
      prefix sysd: <http://localhost:8000/bdm/api/appindividual/sysd:>
      prefix repd: <http://localhost:8000/bdm/api/appindividual/repd:>
      prefix skos: <http://www.w3.org/2004/02/skos/core#>
      prefix syapse: <http://localhost:8000/graph/syapse#>
      PREFIX vocab: <http://test-ted.syapse.com:2020/resource/vocab/>
      SELECT *
      FROM <http://localhost:8000/graph/vocabulary>
      FROM <http://localhost:8000/graph/syapse>
      FROM <http://localhost:8000/graph/django/diagnosticsInc>
      FROM <http://localhost:8000/graph/ontology/base>
      FROM <http://localhost:8000/graph/ontology/rep>
      FROM <http://localhost:8000/graph/diagnosticsInc/abox>
      FROM <http://localhost:8000/graph/diagnosticsInc/vocabulary>
      FROM <http://localhost:8000/graph/ontology/sys>
      FROM NAMED <http://localhost:8000/graph/vocabulary>
      FROM NAMED <http://localhost:8000/graph/syapse>
      FROM NAMED <http://localhost:8000/graph/django/diagnosticsInc>
      FROM NAMED <http://localhost:8000/graph/ontology/base>
      FROM NAMED <http://localhost:8000/graph/ontology/rep>
      FROM NAMED <http://localhost:8000/graph/diagnosticsInc/abox>
      FROM NAMED <http://localhost:8000/graph/diagnosticsInc/vocabulary>
      FROM NAMED <http://localhost:8000/graph/ontology/sys>
      
      WITH {
      SELECT DISTINCT $j__1 ?PatientOmicsRecord_A $dbsnpId $dbsnpIdInt $varstring
      FROM <http://localhost:8000/graph/vocabulary>
      FROM <http://localhost:8000/graph/syapse>
      FROM <http://localhost:8000/graph/django/diagnosticsInc>
      FROM <http://localhost:8000/graph/ontology/base>
      FROM <http://localhost:8000/graph/ontology/rep>
      FROM <http://localhost:8000/graph/diagnosticsInc/abox>
      FROM <http://localhost:8000/graph/diagnosticsInc/vocabulary>
      FROM <http://localhost:8000/graph/ontology/sys>
      FROM NAMED <http://localhost:8000/graph/vocabulary>
      FROM NAMED <http://localhost:8000/graph/syapse>
      FROM NAMED <http://localhost:8000/graph/django/diagnosticsInc>
      FROM NAMED <http://localhost:8000/graph/ontology/base>
      FROM NAMED <http://localhost:8000/graph/ontology/rep>
      FROM NAMED <http://localhost:8000/graph/diagnosticsInc/abox>
      FROM NAMED <http://localhost:8000/graph/diagnosticsInc/vocabulary>
      FROM NAMED <http://localhost:8000/graph/ontology/sys>
      
      WHERE {
      
       OPTIONAL {
         ?PatientOmicsRecord_A sys:name $j__1
       }
       INCLUDE %__MainQuery
      }} AS %__FullQuery
      WITH {
      SELECT *
      FROM <http://localhost:8000/graph/vocabulary>
      FROM <http://localhost:8000/graph/syapse>
      FROM <http://localhost:8000/graph/django/diagnosticsInc>
      FROM <http://localhost:8000/graph/ontology/base>
      FROM <http://localhost:8000/graph/ontology/rep>
      FROM <http://localhost:8000/graph/diagnosticsInc/abox>
      FROM <http://localhost:8000/graph/diagnosticsInc/vocabulary>
      FROM <http://localhost:8000/graph/ontology/sys>
      FROM NAMED <http://localhost:8000/graph/vocabulary>
      FROM NAMED <http://localhost:8000/graph/syapse>
      FROM NAMED <http://localhost:8000/graph/django/diagnosticsInc>
      FROM NAMED <http://localhost:8000/graph/ontology/base>
      FROM NAMED <http://localhost:8000/graph/ontology/rep>
      FROM NAMED <http://localhost:8000/graph/diagnosticsInc/abox>
      FROM NAMED <http://localhost:8000/graph/diagnosticsInc/vocabulary>
      FROM NAMED <http://localhost:8000/graph/ontology/sys>
      
      WHERE {
      BIND (<http://localhost:8000/bdm/api/syuser/12> as ?user)
      
      
       { SELECT DISTINCT ?PatientOmicsRecord_A ?user
         {
           { ?PatientOmicsRecord_A sys:owner ?user }
           UNION
           { ?PatientOmicsRecord_A sys:assignedProject ?project .
             ?project syapse:isPrivate false .
           }
           UNION
           { ?PatientOmicsRecord_A sys:assignedProject ?project .
             ?project syapse:member ?user .
           }
         }
       }
      
       ?PatientOmicsRecord_A rdf:type / rdfs:subClassOf * rep:PatientVariantRecord .
       ?PatientOmicsRecord_A rep:hasVariantRecordAlteration/rep:dbSnpId/skos:prefLabel $dbsnpId .
             BIND (  SUBSTR($dbsnpId, 3)  as $dbsnpIdInt )
      	SERVICE <http://SPARQL-END-POINT/sparql> {
       		?s vocab:snp_snp_id $dbsnpIdInt;
      		   vocab:snp_univar_id ?u .
      		?xx vocab:univariation_univar_id ?u;
      		    vocab:univariation_var_str $varstring
      	}
      }} AS %__MainQuery
      
      WHERE {
      
      { SELECT (COUNT(*) AS $S__COUNT)
       WHERE {
         INCLUDE %__FullQuery
       }
      }
      INCLUDE %__FullQuery
      
      }
      ORDER BY DESC(?j__1) DESC(?j__1) ?PatientOmicsRecord_A
      LIMIT 20
      

      the d2rserver reports ...

      Note the BINDINGS looks wrong

      INFO  SPARQL               :: Query:  
        prefix owl: <http://www.w3.org/2002/07/owl#>  
        prefix based: <http://localhost:8000/bdm/api/appindividual/based:>  
        prefix xsd: <http://www.w3.org/2001/XMLSchema#>  
        prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
        prefix rep: <http://localhost:8000/bdm/api/kbobject/rep:>  
        prefix dc: <http://purl.org/dc/elements/1.1/>  
        prefix sys: <http://localhost:8000/bdm/api/kbobject/sys:>  
        prefix base: <http://localhost:8000/bdm/api/kbobject/base:>  
        prefix s: <http://localhost:8000/bdm/api/>  
        prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
        prefix bds: <http://www.bigdata.com/rdf/search#>  
        prefix sysd: <http://localhost:8000/bdm/api/appindividual/sysd:>  
        prefix repd: <http://localhost:8000/bdm/api/appindividual/repd:> 
        prefix skos: <http://www.w3.org/2004/02/skos/core#> 
        prefix syapse: <http://localhost:8000/graph/syapse#> 
        prefix vocab: <http://test-ted.syapse.com:2020/resource/vocab/> 
        SELECT  ?s ?dbsnpIdInt ?u ?xx ?varstring WHERE {     		
           ?s vocab:snp_snp_id $dbsnpIdInt;  		  
               vocab:snp_univar_id ?u .  		
           ?xx vocab:univariation_univar_id ?u;  		    
              vocab:univariation_var_str $varstring  	 } 
      BINDINGS { ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) }
      
      
      00:58:40 DEBUG QueryEngineD2RQ      :: Before translation:
      (project (?s ?dbsnpIdInt ?u ?xx ?varstring)
       (join
         (bgp
           (triple ?s <http://test-ted.syapse.com:2020/resource/vocab/snp_snp_id> ?dbsnpIdInt)
           (triple ?s <http://test-ted.syapse.com:2020/resource/vocab/snp_univar_id> ?u)
           (triple ?xx <http://test-ted.syapse.com:2020/resource/vocab/univariation_univar_id> ?u)
           (triple ?xx <http://test-ted.syapse.com:2020/resource/vocab/univariation_var_str> ?varstring)
         )
         (table
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
           (row)
         )))
      
      
      

      This is blocking a colleague so I am marking it as critical and will work on it with some urgency

      ----

      See BLZG-745 (Slow BIND) which is probably the root cause for this ticket.

        Activity

        Hide
        jeremycarroll jeremycarroll added a comment -

        This dump of a much simpler query seems to indicate an optimizer defect, I believe the functionNode and the ServiceNode should be in the other order in the Optimized AST

        SELECT * { 
          ?s ?p ?o .
         BIND( substr(?o, 3) as ?oo )
          SERVICE <http://example.com:666/sparql> {
             ?ss ?pp ?oo
          }
        
        } LIMIT 10
        Parse Tree
        
        QueryContainer
         SelectQuery
          Select ( * )
          WhereClause
           GraphPatternGroup
            BasicGraphPattern
             TriplesSameSubjectPath
              Var (s)
              PropertyListPath
               Var (p)
               ObjectList
                Var (o)
             Bind
              Substr
               Var (o)
               NumericLiteral (value=3, datatype=http://www.w3.org/2001/XMLSchema#integer)
              Var (oo)
            ServiceGraphPattern
             IRI (http://example.com:666/sparql)
             GraphPatternGroup
              BasicGraphPattern
               TriplesSameSubjectPath
                Var (ss)
                PropertyListPath
                 Var (pp)
                 ObjectList
                  Var (oo)
          Limit (10)
        Original AST
        
        QueryType: SELECT
        includeInferred=true
        SELECT * 
          JoinGroupNode {
            StatementPatternNode(VarNode(s), VarNode(p), VarNode(o)) [scope=DEFAULT_CONTEXTS]
            ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(o),ConstantNode(XSDInteger(3)))[ com.bigdata.rdf.sparql.ast.FunctionNode.scalarVals=null, com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#substring, valueExpr=com.bigdata.rdf.internal.constraints.SubstrBOp(o,XSDInteger(3),<null>)[ com.bigdata.rdf.internal.constraints.IVValueExpression.namespace=.lex, com.bigdata.rdf.internal.constraints.IVValueExpression.timestamp=-1]] AS VarNode(oo) )
            SERVICE <ConstantNode(TermId(0U)[http://example.com:666/sparql])> {
              JoinGroupNode {
                StatementPatternNode(VarNode(ss), VarNode(pp), VarNode(oo)) [scope=DEFAULT_CONTEXTS]
              }
            }
          }
        slice(limit=10)
        Optimized AST
        
        QueryType: SELECT
        includeInferred=true
        SELECT VarNode(s) VarNode(p) VarNode(o) VarNode(oo)
          JoinGroupNode {
            StatementPatternNode(VarNode(s), VarNode(p), VarNode(o)) [scope=DEFAULT_CONTEXTS]
              com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=70603
              com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=SPOC
            SERVICE <ConstantNode(TermId(0U)[http://example.com:666/sparql])> {
              JoinGroupNode {
                StatementPatternNode(VarNode(ss), VarNode(pp), VarNode(oo)) [scope=DEFAULT_CONTEXTS]
              }
            }
            ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(o),ConstantNode(XSDInteger(3)))[ com.bigdata.rdf.sparql.ast.FunctionNode.scalarVals=null, com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#substring, valueExpr=com.bigdata.rdf.internal.constraints.SubstrBOp(o,XSDInteger(3),<null>)[ com.bigdata.rdf.internal.constraints.IVValueExpression.namespace=.lex, com.bigdata.rdf.internal.constraints.IVValueExpression.timestamp=-1]] AS VarNode(oo) )
          }
        slice(limit=10)
        
        Show
        jeremycarroll jeremycarroll added a comment - This dump of a much simpler query seems to indicate an optimizer defect, I believe the functionNode and the ServiceNode should be in the other order in the Optimized AST SELECT * { ?s ?p ?o . BIND( substr(?o, 3) as ?oo ) SERVICE <http://example.com:666/sparql> { ?ss ?pp ?oo } } LIMIT 10 Parse Tree QueryContainer SelectQuery Select ( * ) WhereClause GraphPatternGroup BasicGraphPattern TriplesSameSubjectPath Var (s) PropertyListPath Var (p) ObjectList Var (o) Bind Substr Var (o) NumericLiteral (value=3, datatype=http://www.w3.org/2001/XMLSchema#integer) Var (oo) ServiceGraphPattern IRI (http://example.com:666/sparql) GraphPatternGroup BasicGraphPattern TriplesSameSubjectPath Var (ss) PropertyListPath Var (pp) ObjectList Var (oo) Limit (10) Original AST QueryType: SELECT includeInferred=true SELECT * JoinGroupNode { StatementPatternNode(VarNode(s), VarNode(p), VarNode(o)) [scope=DEFAULT_CONTEXTS] ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(o),ConstantNode(XSDInteger(3)))[ com.bigdata.rdf.sparql.ast.FunctionNode.scalarVals=null, com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#substring, valueExpr=com.bigdata.rdf.internal.constraints.SubstrBOp(o,XSDInteger(3),<null>)[ com.bigdata.rdf.internal.constraints.IVValueExpression.namespace=.lex, com.bigdata.rdf.internal.constraints.IVValueExpression.timestamp=-1]] AS VarNode(oo) ) SERVICE <ConstantNode(TermId(0U)[http://example.com:666/sparql])> { JoinGroupNode { StatementPatternNode(VarNode(ss), VarNode(pp), VarNode(oo)) [scope=DEFAULT_CONTEXTS] } } } slice(limit=10) Optimized AST QueryType: SELECT includeInferred=true SELECT VarNode(s) VarNode(p) VarNode(o) VarNode(oo) JoinGroupNode { StatementPatternNode(VarNode(s), VarNode(p), VarNode(o)) [scope=DEFAULT_CONTEXTS] com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=70603 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=SPOC SERVICE <ConstantNode(TermId(0U)[http://example.com:666/sparql])> { JoinGroupNode { StatementPatternNode(VarNode(ss), VarNode(pp), VarNode(oo)) [scope=DEFAULT_CONTEXTS] } } ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(o),ConstantNode(XSDInteger(3)))[ com.bigdata.rdf.sparql.ast.FunctionNode.scalarVals=null, com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#substring, valueExpr=com.bigdata.rdf.internal.constraints.SubstrBOp(o,XSDInteger(3),<null>)[ com.bigdata.rdf.internal.constraints.IVValueExpression.namespace=.lex, com.bigdata.rdf.internal.constraints.IVValueExpression.timestamp=-1]] AS VarNode(oo) ) } slice(limit=10)
        Hide
        jeremycarroll jeremycarroll added a comment -

        There are further issues with SERVICE
        e.g. consider,

        SELECT * {
         SERVICE <http://localhost:2333/sparql> {
            FILTER (true)
            { 
            SELECT * {
               ?s ?p ?o 
              } LIMIT 1
            } 
          }
        } LIMIT 1
        

        where the service call comes back to the current bigdata instance, and there is at least one triple in the default graph

        This fails to return the variable bindings back somehow.
        Also commenting out the suoperfluous FILTER(true) results in a syntax error during the SERVICE call.

        Show
        jeremycarroll jeremycarroll added a comment - There are further issues with SERVICE e.g. consider, SELECT * { SERVICE <http://localhost:2333/sparql> { FILTER (true) { SELECT * { ?s ?p ?o } LIMIT 1 } } } LIMIT 1 where the service call comes back to the current bigdata instance, and there is at least one triple in the default graph This fails to return the variable bindings back somehow. Also commenting out the suoperfluous FILTER(true) results in a syntax error during the SERVICE call.
        Hide
        jeremycarroll jeremycarroll added a comment -

        The following two queries also malfunction showing the problem is the treatment of the BIND

        Assumptions: data set includes a longish literal beginning in "n"
        and does not include a literal beginning in "banana n"

        SELECT * {
        ?s ?p ?o
        FILTER (strlen(?o)> 10)
        FILTER (strstarts(?o,"n"))
        BIND ( concat("banana ", ?o) as ?banana )
        FILTER EXISTS { ?s ?p ?banana }

        } LIMIT 1

        should return nothing, but actually returns the value

        SELECT * {
        ?s ?p ?o
        FILTER (strlen(?o)> 10)
        FILTER (strstarts(?o,"n"))
        BIND ( concat("banana ", ?o) as ?banana )
        FILTER NOT EXISTS { ?s ?p ?banana }

        } LIMIT 1

        should return the values, but actually does not return

        Show
        jeremycarroll jeremycarroll added a comment - The following two queries also malfunction showing the problem is the treatment of the BIND Assumptions: data set includes a longish literal beginning in "n" and does not include a literal beginning in "banana n" SELECT * { ?s ?p ?o FILTER (strlen(?o)> 10) FILTER (strstarts(?o,"n")) BIND ( concat("banana ", ?o) as ?banana ) FILTER EXISTS { ?s ?p ?banana } } LIMIT 1 should return nothing, but actually returns the value SELECT * { ?s ?p ?o FILTER (strlen(?o)> 10) FILTER (strstarts(?o,"n")) BIND ( concat("banana ", ?o) as ?banana ) FILTER NOT EXISTS { ?s ?p ?banana } } LIMIT 1 should return the values, but actually does not return
        Hide
        jeremycarroll jeremycarroll added a comment -

        This one is pretty compelling incorrect too

        SELECT * { 
            ?s ?p ?o 
           FILTER (strlen(?o)> 10)
           FILTER (strstarts(?o,"n"))
           BIND ( concat("banana ", ?o) as ?banana )
           OPTIONAL {
             ?s ?p ?banana
           }
        } LIMIT 1
        

        The OPTIONAL causes the query to fail!

        Show
        jeremycarroll jeremycarroll added a comment - This one is pretty compelling incorrect too SELECT * { ?s ?p ?o FILTER (strlen(?o)> 10) FILTER (strstarts(?o,"n")) BIND ( concat("banana ", ?o) as ?banana ) OPTIONAL { ?s ?p ?banana } } LIMIT 1 The OPTIONAL causes the query to fail!
        Hide
        jeremycarroll jeremycarroll added a comment -

        A workaround is to use a subselect instead of a BIND.
        This is illustrated in
        com.bigdata.rdf.sail.webapp.TestService794.testMassageServiceNested3Call()

        SELECT * { 
         { select ?s ?p ?o ( substr(?o, 3) as ?oo )
           {  ?s ?p ?o .
           }
          }
          SERVICE <http://example.com:666/sparql> {
             ?ss ?pp ?oo
          }
        
        } LIMIT 10
        
        Show
        jeremycarroll jeremycarroll added a comment - A workaround is to use a subselect instead of a BIND. This is illustrated in com.bigdata.rdf.sail.webapp.TestService794.testMassageServiceNested3Call() SELECT * { { select ?s ?p ?o ( substr(?o, 3) as ?oo ) { ?s ?p ?o . } } SERVICE <http://example.com:666/sparql> { ?ss ?pp ?oo } } LIMIT 10
        Hide
        jeremycarroll jeremycarroll added a comment -

        see also trac 653

        Show
        jeremycarroll jeremycarroll added a comment - see also trac 653
        Hide
        michaelschmidt michaelschmidt added a comment -

        Issue has been fixed in the master as part of join refactoring. I've written test cases for all the queries in the comments, but was not able to set up a test case for the main comment. Not 100% sure whether this is related to the queries in the comments, please verify and reopen a new ticket in case this has not been resolved and is still an issue.

        Please find below some generic comments on the join order refactoring which includes fixes for the positioning of BIND nodes within join groups as required by this fix.

        1. Key changes:
          1. Implemented new optimizer (ASTJoinGroupOptimizer), which puts the constructs into the right order
            1. Addressing mainly two correctness problems
              1. Reordering of non-reorderable constructs (OPTIONAL problematics)
              2. Proper handling of nodes that require variables to be bound (FILTER NOT EXISTS, BIND, certain SERVICEs)
          2. Approaches to the two problems
            1. Join order optimization now takes “partitions” (in form of OPTIONALs into account), reordering across partitions only where valid
            2. New Interface IVariableBindingRequirements with central method getRequiredVariables(), which determines the variables that must be bound prior to executing a mode
          3. Optimizations:
            1. Proper treatment of FILTER NOT EXISTS queries, now correct + more precise placement (as for other FILTERs)
            2. Also more precise placement of FILTER expressions that cannot be attached to a single join group node, namely
              1. … as early as possible
              2. … correctly placed in case triple patterns are reordered by the static optimizer
            3. Proper treatment of complex SERVICEs requiring incoming bound variables -> placed at first possible position
            4. Proper treatment of BIND and VALUES clauses -> placed at first
          4. Reusable components for FILTER placement, extraction of interesting variable sets from join groups, etc. -> clear the way for accelerated implementation of other rewriting heuristics
          5. Two “modes” for optimizer:
            1. Assert valid order & optimize
            2. Assert valid order only
          6. Test cases for optimizer itself at query and AST level
        1. Refactoring changes
          1. Interface IVariableBindingRequirements, with the central method getRequiredBound() and associated methods at ServiceFactory (with slightly different signature for extracting this information from service nodes)
            1. Implementation of the interface in various classes based on StaticAnalysis utility methods
            2. For services, the interface implementation is (at the time being) the same everywhere (requiredBound = \emptyset, imposing no constraint), see FulltextSearchServiceFactory.getRequiredBound() for how an implementation could look like (and we may want to adjust other services to offer incoming bound variables in the future)
            3. The implementation of this interface amounts for most of the changes. Many of them invoke setting up a new empty hash set.
          2. The optimizer itself: ASTJoinGroupOrderOptimizer
            1. Closely related are ASTJoinGroupPartition(s), offering data structure and the key utility methods for placement of nodes within partitions based on heuristics and variable binding constraints
            2. IASTJoinGroupPartitionReorderer: interface for an inter-partition optimizer; currently, there’s only a simple implementation (TypeBasedASTJoinGroupPartitionReorderer, in large parts reflecting the behaviour of the old optimizer), but this is how I would envision to hook in the StaticAnalysis (the interface might need to be changed/extended, it’s a first step).
          3. Reusable utility classes (used by the optimizer)
            1. GroupNodeVarBindingInfo & GroupNodeVarBindingInfoMap -> summary container for looking up different aspects related to variables in the group node (independent from its position in a given join group), and an associated class easing construction of GroupNodeVarBindingInfo objects for a set of nodes
            2. ASTFilterPlacer: utility class that supports the precise and correct placement of FILTER expressions within a list of IGroupMemberNodes
              ASTJoinGroupFilterExistsInfo: class implementing functionality to identify and access FILTER [NOT] EXISTS nodes within a join group (which are translated as a hybrid of an ASK subquery and a FilterNode, thus requiring special handling)
            3. ASTTypeBasedNodeClassifier (&ASTTypeBasedNodeClassifierConstraint): supports, given a set of types as input, the partitioning of a node list into lists of nodes with the give type + rest, including additional constraints for membership to a certain partition
          4. Minor things
        2. ASTBottomUpOptimizer -> minor bugfix (as documented inline)
        3. ASTSparql11SubqueryOptimizer -> minor bugfix (inheritance of bindings clauses to subqueries)
        4. ASTStaticBindingsOptimizer -> minor bugfix (see in code documentation for details)
        5. DefaultOptimizerList -> hooked in new optimizer
        Show
        michaelschmidt michaelschmidt added a comment - Issue has been fixed in the master as part of join refactoring. I've written test cases for all the queries in the comments, but was not able to set up a test case for the main comment. Not 100% sure whether this is related to the queries in the comments, please verify and reopen a new ticket in case this has not been resolved and is still an issue. Please find below some generic comments on the join order refactoring which includes fixes for the positioning of BIND nodes within join groups as required by this fix. — Key changes: Implemented new optimizer (ASTJoinGroupOptimizer), which puts the constructs into the right order Addressing mainly two correctness problems Reordering of non-reorderable constructs (OPTIONAL problematics) Proper handling of nodes that require variables to be bound (FILTER NOT EXISTS, BIND, certain SERVICEs) Approaches to the two problems Join order optimization now takes “partitions” (in form of OPTIONALs into account), reordering across partitions only where valid New Interface IVariableBindingRequirements with central method getRequiredVariables(), which determines the variables that must be bound prior to executing a mode Optimizations: Proper treatment of FILTER NOT EXISTS queries, now correct + more precise placement (as for other FILTERs) Also more precise placement of FILTER expressions that cannot be attached to a single join group node, namely … as early as possible … correctly placed in case triple patterns are reordered by the static optimizer Proper treatment of complex SERVICEs requiring incoming bound variables -> placed at first possible position Proper treatment of BIND and VALUES clauses -> placed at first Reusable components for FILTER placement, extraction of interesting variable sets from join groups, etc. -> clear the way for accelerated implementation of other rewriting heuristics Two “modes” for optimizer: Assert valid order & optimize Assert valid order only Test cases for optimizer itself at query and AST level Refactoring changes Interface IVariableBindingRequirements, with the central method getRequiredBound() and associated methods at ServiceFactory (with slightly different signature for extracting this information from service nodes) Implementation of the interface in various classes based on StaticAnalysis utility methods For services, the interface implementation is (at the time being) the same everywhere (requiredBound = \emptyset, imposing no constraint), see FulltextSearchServiceFactory.getRequiredBound() for how an implementation could look like (and we may want to adjust other services to offer incoming bound variables in the future) The implementation of this interface amounts for most of the changes. Many of them invoke setting up a new empty hash set. The optimizer itself: ASTJoinGroupOrderOptimizer Closely related are ASTJoinGroupPartition(s), offering data structure and the key utility methods for placement of nodes within partitions based on heuristics and variable binding constraints IASTJoinGroupPartitionReorderer: interface for an inter-partition optimizer; currently, there’s only a simple implementation (TypeBasedASTJoinGroupPartitionReorderer, in large parts reflecting the behaviour of the old optimizer), but this is how I would envision to hook in the StaticAnalysis (the interface might need to be changed/extended, it’s a first step). Reusable utility classes (used by the optimizer) GroupNodeVarBindingInfo & GroupNodeVarBindingInfoMap -> summary container for looking up different aspects related to variables in the group node (independent from its position in a given join group), and an associated class easing construction of GroupNodeVarBindingInfo objects for a set of nodes ASTFilterPlacer: utility class that supports the precise and correct placement of FILTER expressions within a list of IGroupMemberNodes ASTJoinGroupFilterExistsInfo: class implementing functionality to identify and access FILTER [NOT] EXISTS nodes within a join group (which are translated as a hybrid of an ASK subquery and a FilterNode, thus requiring special handling) ASTTypeBasedNodeClassifier (&ASTTypeBasedNodeClassifierConstraint): supports, given a set of types as input, the partitioning of a node list into lists of nodes with the give type + rest, including additional constraints for membership to a certain partition Minor things ASTBottomUpOptimizer -> minor bugfix (as documented inline) ASTSparql11SubqueryOptimizer -> minor bugfix (inheritance of bindings clauses to subqueries) ASTStaticBindingsOptimizer -> minor bugfix (see in code documentation for details) DefaultOptimizerList -> hooked in new optimizer
        Hide
        beebs Brad Bebee added a comment -

        @jjc Please verify on 1.5.2.

        Show
        beebs Brad Bebee added a comment - @jjc Please verify on 1.5.2.

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            jeremycarroll jeremycarroll
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: