Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-535

Optimize hash joins when there are no source solutions (or only the exogenous bindings)

    Details

      Description

      Join variables are currently set based on the incoming bound variables to the subgroup and the definitely bound variables in the subgroup. When a subgroup will run first in the query, it has no incoming bound variables. This causes it to build a hash index with no join variables. This leads to hash joins without join variables, which are VERY expensive.

      The reason why we can not specify join variables based on the definitely bound variables in the sub-group is that the hash code of the exogenous/empty solution will be typically be undefined since it will not have bindings for the join variables.

      We need to recognize this case and handle it differently. Rather than reducing the join variables to an empty set, the join variables should be all definitely bound variables and we should INCLUDE the solution set into the parent using a different operator.

      Basically, all the operator needs to do is drain the solutions from the hash index, pushing them into the pipeline (its sink). It can not do a hash join against the source solution (either empty or containing just the exogenous bindings) because there is no guarantee that the source solution will share the join variables (in fact, it nearly always will NOT share the join variables). However, the source solution has very low cardinality (ONE).

      Therefore, for each solution in the hash index, it attempts a join with each source solution in turn. While this is conceptually a cross product unconstrained by the presence of the join variables, in fact there is only one source solution so this amounts to a single scan of the hash index in which we possibly pickup and/or filter based on the exogenous bindings.

      This operation is currently executed by a (JVM|HTree)SolutionSetHashJoin. Perhaps the easiest thing would be to add an annotation to that hash join indicating that it should IGNORE the join variables and do a full (1 x M) cross product.

        Activity

        beebs Brad Bebee created issue -
        Hide
        bryanthompson bryanthompson added a comment -

        Note: This only ever makes a difference for a merge join. Right now the merge join will build a hash index on the join variables if it does not already have one on the right join variables. This issue would eliminate the cost of that hash index build.

        Show
        bryanthompson bryanthompson added a comment - Note: This only ever makes a difference for a merge join. Right now the merge join will build a hash index on the join variables if it does not already have one on the right join variables. This issue would eliminate the cost of that hash index build.
        Hide
        bryanthompson bryanthompson added a comment -

        I spoke too soon. BSBM BI Q4 encounters this problem.

        Turning

        prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
        prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
        prefix xsd: <http://www.w3.org/2001/XMLSchema#>
        
        Select ?feature ((?sumF*(?countTotal-?countF))/(?countF*(?sumTotal-?sumF)) As ?priceRatio)
          {
            { Select (count(?price) As ?countTotal) (sum(xsd:float(xsd:string(?price))) As ?sumTotal)
              {
                ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3> .
                ?offer bsbm:product ?product ;
                       bsbm:price ?price .
              }
            }
            { Select ?feature (count(?price2) As ?countF) (sum(xsd:float(xsd:string(?price2))) As ?sumF)
              {
                ?product2 a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3> ;
                         bsbm:productFeature ?feature .
                ?offer2 bsbm:product ?product2 ;
                       bsbm:price ?price2 .
              }
              Group By ?feature
            }
          }
         Order By desc(?priceRatio) ?feature
         Limit 100
        

        Into

        hint: [com.bigdata.rdf.sparql.ast.QueryHints.queryId]=[2ed21f84-f160-4c81-b039-c0611caaf3f2]
        PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
        PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
        WITH {
          QueryType: SELECT
          SELECT ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(price))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(price)] AS VarNode(countTotal) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(FunctionNode(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float]))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#sum, valueExpr=com.bigdata.bop.rdf.aggregate.SUM(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float])[ com.bigdata.bop.aggregate.AggregateBase.distinct=false]] AS VarNode(sumTotal) )
            JoinGroupNode {
              StatementPatternNode(VarNode(product), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), ConstantNode(TermId(69079U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3]), DEFAULT_CONTEXTS)
                queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2}
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=45477
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS
              StatementPatternNode(VarNode(offer), ConstantNode(TermId(1937853U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product]), VarNode(product), DEFAULT_CONTEXTS)
                queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2}
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS
              StatementPatternNode(VarNode(offer), ConstantNode(TermId(1937852U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/price]), VarNode(price), DEFAULT_CONTEXTS)
                queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2}
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS
            }
        } AS -subSelect-1 JOIN ON () DEPENDS ON ()
        WITH {
          QueryType: SELECT
          SELECT ( VarNode(feature) AS VarNode(feature) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(price2))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(price2)] AS VarNode(countF) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(FunctionNode(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price2)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float]))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#sum, valueExpr=com.bigdata.bop.rdf.aggregate.SUM(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price2)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float])[ com.bigdata.bop.aggregate.AggregateBase.distinct=false]] AS VarNode(sumF) )
            JoinGroupNode {
              StatementPatternNode(VarNode(product2), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), ConstantNode(TermId(69079U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3]), DEFAULT_CONTEXTS)
                queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2}
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=45477
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS
              StatementPatternNode(VarNode(product2), ConstantNode(TermId(69861U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productFeature]), VarNode(feature), DEFAULT_CONTEXTS)
                queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2}
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5533832
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS
              StatementPatternNode(VarNode(offer2), ConstantNode(TermId(1937853U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product]), VarNode(product2), DEFAULT_CONTEXTS)
                queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2}
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS
              StatementPatternNode(VarNode(offer2), ConstantNode(TermId(1937852U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/price]), VarNode(price2), DEFAULT_CONTEXTS)
                queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2}
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520
                com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS
            }
          group by ( VarNode(feature) AS VarNode(feature) )
        } AS -subSelect-2 JOIN ON () DEPENDS ON ()
        QueryType: SELECT
        includeInferred=true
        SELECT ( VarNode(feature) AS VarNode(feature) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(FunctionNode(MULTIPLY(sumF, MINUS(countTotal, countF))),FunctionNode(MULTIPLY(countF, MINUS(sumTotal, sumF))))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#numeric-divide, valueExpr=DIVIDE(MULTIPLY(sumF, MINUS(countTotal, countF)), MULTIPLY(countF, MINUS(sumTotal, sumF)))] AS VarNode(priceRatio) )
          JoinGroupNode {
            INCLUDE -subSelect-1 JOIN ON ()
            INCLUDE -subSelect-2 JOIN ON ()
          }
        order by com.bigdata.rdf.sparql.ast.OrderByExpr(VarNode(priceRatio))[ ascending=false] com.bigdata.rdf.sparql.ast.OrderByExpr(VarNode(feature))[ ascending=true]
        slice(limit=100)
        
        Show
        bryanthompson bryanthompson added a comment - I spoke too soon. BSBM BI Q4 encounters this problem. Turning prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/> prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> Select ?feature ((?sumF*(?countTotal-?countF))/(?countF*(?sumTotal-?sumF)) As ?priceRatio) { { Select (count(?price) As ?countTotal) (sum(xsd:float(xsd:string(?price))) As ?sumTotal) { ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3> . ?offer bsbm:product ?product ; bsbm:price ?price . } } { Select ?feature (count(?price2) As ?countF) (sum(xsd:float(xsd:string(?price2))) As ?sumF) { ?product2 a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3> ; bsbm:productFeature ?feature . ?offer2 bsbm:product ?product2 ; bsbm:price ?price2 . } Group By ?feature } } Order By desc(?priceRatio) ?feature Limit 100 Into hint: [com.bigdata.rdf.sparql.ast.QueryHints.queryId]=[2ed21f84-f160-4c81-b039-c0611caaf3f2] PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/> PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> WITH { QueryType: SELECT SELECT ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(price))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(price)] AS VarNode(countTotal) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(FunctionNode(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float]))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#sum, valueExpr=com.bigdata.bop.rdf.aggregate.SUM(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float])[ com.bigdata.bop.aggregate.AggregateBase.distinct=false]] AS VarNode(sumTotal) ) JoinGroupNode { StatementPatternNode(VarNode(product), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), ConstantNode(TermId(69079U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3]), DEFAULT_CONTEXTS) queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2} com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=45477 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS StatementPatternNode(VarNode(offer), ConstantNode(TermId(1937853U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product]), VarNode(product), DEFAULT_CONTEXTS) queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2} com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS StatementPatternNode(VarNode(offer), ConstantNode(TermId(1937852U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/price]), VarNode(price), DEFAULT_CONTEXTS) queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2} com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS } } AS -subSelect-1 JOIN ON () DEPENDS ON () WITH { QueryType: SELECT SELECT ( VarNode(feature) AS VarNode(feature) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(price2))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(price2)] AS VarNode(countF) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(FunctionNode(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price2)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float]))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#sum, valueExpr=com.bigdata.bop.rdf.aggregate.SUM(com.bigdata.rdf.internal.constraints.FuncBOp(com.bigdata.rdf.internal.constraints.FuncBOp(price2)[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#string])[ com.bigdata.rdf.internal.constraints.FuncBOp.namespace=BSBM_284826.lex, com.bigdata.rdf.internal.constraints.FuncBOp.function=http://www.w3.org/2001/XMLSchema#float])[ com.bigdata.bop.aggregate.AggregateBase.distinct=false]] AS VarNode(sumF) ) JoinGroupNode { StatementPatternNode(VarNode(product2), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), ConstantNode(TermId(69079U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType3]), DEFAULT_CONTEXTS) queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2} com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=45477 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS StatementPatternNode(VarNode(product2), ConstantNode(TermId(69861U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productFeature]), VarNode(feature), DEFAULT_CONTEXTS) queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2} com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5533832 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS StatementPatternNode(VarNode(offer2), ConstantNode(TermId(1937853U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/product]), VarNode(product2), DEFAULT_CONTEXTS) queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2} com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS StatementPatternNode(VarNode(offer2), ConstantNode(TermId(1937852U)[http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/price]), VarNode(price2), DEFAULT_CONTEXTS) queryHints={com.bigdata.rdf.sparql.ast.QueryHints.queryId=2ed21f84-f160-4c81-b039-c0611caaf3f2} com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.estimatedCardinality=5696520 com.bigdata.rdf.sparql.ast.eval.AST2BOpBase.originalIndex=POS } group by ( VarNode(feature) AS VarNode(feature) ) } AS -subSelect-2 JOIN ON () DEPENDS ON () QueryType: SELECT includeInferred=true SELECT ( VarNode(feature) AS VarNode(feature) ) ( com.bigdata.rdf.sparql.ast.FunctionNode(FunctionNode(MULTIPLY(sumF, MINUS(countTotal, countF))),FunctionNode(MULTIPLY(countF, MINUS(sumTotal, sumF))))[ com.bigdata.rdf.sparql.ast.FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#numeric-divide, valueExpr=DIVIDE(MULTIPLY(sumF, MINUS(countTotal, countF)), MULTIPLY(countF, MINUS(sumTotal, sumF)))] AS VarNode(priceRatio) ) JoinGroupNode { INCLUDE -subSelect-1 JOIN ON () INCLUDE -subSelect-2 JOIN ON () } order by com.bigdata.rdf.sparql.ast.OrderByExpr(VarNode(priceRatio))[ ascending=false] com.bigdata.rdf.sparql.ast.OrderByExpr(VarNode(feature))[ ascending=true] slice(limit=100)
        Hide
        bryanthompson bryanthompson added a comment -

        I am unconvinced about this issue. I am going to close it out for the moment. We can come back to the question later.

        Show
        bryanthompson bryanthompson added a comment - I am unconvinced about this issue. I am going to close it out for the moment. We can come back to the question later.
        Hide
        bryanthompson bryanthompson added a comment -

        We are special caseing the situation where the INCLUDE appears as the
        first operator in a WHERE clause (of either the QueryRoot or a
        NamedSubqueryRoot) and cardinality of the exogenous solutions is low.
        In that case we are performing a scan over the named solutions,
        joining against each exogenous solution in an inner loop. This is
        efficient. It eliminates a problem where we are not assigning good
        join variables, and it preserves the ORDER of the solutions in the
        named solution set.

        Otherwise, we build a hash index from the named solution set on the
        shared definitely bound variables and then perform a hash join against
        the source solutions.

        Committed revision r6294.

        Show
        bryanthompson bryanthompson added a comment - We are special caseing the situation where the INCLUDE appears as the first operator in a WHERE clause (of either the QueryRoot or a NamedSubqueryRoot) and cardinality of the exogenous solutions is low. In that case we are performing a scan over the named solutions, joining against each exogenous solution in an inner loop. This is efficient. It eliminates a problem where we are not assigning good join variables, and it preserves the ORDER of the solutions in the named solution set. Otherwise, we build a hash index from the named solution set on the shared definitely bound variables and then perform a hash join against the source solutions. Committed revision r6294.
        Hide
        bryanthompson bryanthompson added a comment -

        A nested loop join has been implemented [2]. This was originally called NamedSolutionSetScanOp, but it just just a nested loop join so I have renamed it as "NestedLoopJoinOp" and generalized the operator such that it can accept named solution sets, streams, etc. for the access path to be scanned for each source solution.

        In the case where the INCLUDE (of a pre-existing solution set) is not the first join, an explicit hash index build operation is now performed before the solution set hash join that implements the "INCLUDE". See [2].

        Mike has also fixed bugs in StaticAnalysis related to getDefinitatelyBound() and getKnownBound() [1].

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/412 (StaticAnalysis#getDefinitelyBound() ignores exogenous variables.)

        [2] https://sourceforge.net/apps/trac/bigdata/ticket/531 (SPARQL UPDATE for SOLUTION SETS)

        Committed revision r6421.

        Show
        bryanthompson bryanthompson added a comment - A nested loop join has been implemented [2] . This was originally called NamedSolutionSetScanOp, but it just just a nested loop join so I have renamed it as "NestedLoopJoinOp" and generalized the operator such that it can accept named solution sets, streams, etc. for the access path to be scanned for each source solution. In the case where the INCLUDE (of a pre-existing solution set) is not the first join, an explicit hash index build operation is now performed before the solution set hash join that implements the "INCLUDE". See [2] . Mike has also fixed bugs in StaticAnalysis related to getDefinitatelyBound() and getKnownBound() [1] . [1] https://sourceforge.net/apps/trac/bigdata/ticket/412 (StaticAnalysis#getDefinitelyBound() ignores exogenous variables.) [2] https://sourceforge.net/apps/trac/bigdata/ticket/531 (SPARQL UPDATE for SOLUTION SETS) Committed revision r6421.
        beebs Brad Bebee made changes -
        Field Original Value New Value
        Workflow Trac Import v2 [ 12371 ] Trac Import v3 [ 13951 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v3 [ 13951 ] Trac Import v4 [ 15280 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v4 [ 15280 ] Trac Import v5 [ 16666 ]
        beebs Brad Bebee made changes -
        Labels Issue_patch_20150625
        beebs Brad Bebee made changes -
        Status Closed - Won't Fix [ 6 ] Open [ 1 ]
        beebs Brad Bebee made changes -
        Status Open [ 1 ] Accepted [ 10101 ]
        beebs Brad Bebee made changes -
        Status Accepted [ 10101 ] In Progress [ 3 ]
        beebs Brad Bebee made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        beebs Brad Bebee made changes -
        Status Resolved [ 5 ] In Review [ 10100 ]
        beebs Brad Bebee made changes -
        Resolution Fixed [ 1 ] Done [ 10000 ]
        Status In Review [ 10100 ] Done [ 10000 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v5 [ 16666 ] Trac Import v6 [ 17901 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v6 [ 17901 ] Trac Import v7 [ 19298 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v7 [ 19298 ] Trac Import v8 [ 20919 ]

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: