Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_RELEASE_1_5_3
    • Component/s: None
    • Labels:
      None

      Description

      Noticed under Blazegraph 1.5.2

      This query (simplified, eg. it started with several different IRIs in the FILTER)

      PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      PREFIX dc: <http://purl.org/dc/elements/1.1/>
      PREFIX x: <http://example.com/>
      
      SELECT (count(*) AS ?c) WHERE {
        ?s a ?t .
        ?s foaf:name|dc:title ?name .
        FILTER (?s = x:x || ?s = x:x)
      }
      

      returned c=968730, which is the same number as is returned without the filter.

      whereas equivalent

      PREFIX foaf: <http://xmlns.com/foaf/0.1/>
      PREFIX dc: <http://purl.org/dc/elements/1.1/>
      PREFIX x: <http://example.com/>
      
      SELECT (count(*) AS ?c) WHERE {
        ?s a ?t .
        ?s foaf:name|dc:title ?name .
        FILTER (?s = x:x)
      }
      

      returned correct c=0 (there is no resource from the x: namespace in the graph).

      Optimized AST for the incorrect result:

      SELECT ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(*))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(*)] AS VarNode(c) )
        JoinGroupNode {
          PropertyPathUnionNode [joinVars=[]] [projectInVars=[]] {
            JoinGroupNode [joinVars=[]] [projectInVars=[]] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(117)[http://xmlns.com/foaf/0.1/name]), VarNode(name)) [scope=DEFAULT_CONTEXTS]
                AST2BOpBase.estimatedCardinality=544254
                AST2BOpBase.originalIndex=POCS
            } AST2BOpBase.estimatedCardinality=544254
            JoinGroupNode [joinVars=[]] [projectInVars=[]] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-73)[http://purl.org/dc/elements/1.1/title]), VarNode(name)) [scope=DEFAULT_CONTEXTS]
                AST2BOpBase.estimatedCardinality=424476
                AST2BOpBase.originalIndex=POCS
            } AST2BOpBase.estimatedCardinality=424476
          }
            FILTER( FunctionNode(FunctionNode(VarNode(s),ConstantNode(TermId(0U)[http://example.com/x]))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#equal-to, valueExpr=com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ]],FunctionNode(VarNode(s),ConstantNode(TermId(0U)[http://example.com/x]))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#equal-to, valueExpr=com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ]])[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#logical-or, valueExpr=com.bigdata.rdf.internal.constraints.OrBOp(com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ],com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ])] ) AST2BOpBase.estimatedCardinality=968730
          StatementPatternNode(VarNode(s), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), VarNode(t)) [scope=DEFAULT_CONTEXTS]
            AST2BOpBase.estimatedCardinality=1349948
            AST2BOpBase.originalIndex=POCS
        }
      

      Optimized AST for the correct result:

      SELECT ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(*))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(*)] AS VarNode(c) )
        JoinGroupNode {
          StatementPatternNode(ConstantNode(TermId(0U)[http://example.com/x][var=s]), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), VarNode(t)) [scope=DEFAULT_CONTEXTS] [#filters=1]
            FILTER( FunctionNode(ConstantNode(TermId(0U)[http://example.com/x][var=s]),ConstantNode(TermId(0U)[http://example.com/x]))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#equal-to, valueExpr=com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ]] )
            AST2BOpBase.estimatedCardinality=0
            AST2BOpBase.originalIndex=SPOC
          PropertyPathUnionNode [joinVars=[s]] [projectInVars=[s]] {
            JoinGroupNode [joinVars=[s]] [projectInVars=[s]] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(117)[http://xmlns.com/foaf/0.1/name]), VarNode(name)) [scope=DEFAULT_CONTEXTS]
                AST2BOpBase.estimatedCardinality=544254
                AST2BOpBase.originalIndex=POCS
            } JOIN ON (s) AST2BOpBase.estimatedCardinality=544254
            JoinGroupNode [joinVars=[s]] [projectInVars=[s]] {
              StatementPatternNode(VarNode(s), ConstantNode(Vocab(-73)[http://purl.org/dc/elements/1.1/title]), VarNode(name)) [scope=DEFAULT_CONTEXTS]
                AST2BOpBase.estimatedCardinality=424476
                AST2BOpBase.originalIndex=POCS
            } JOIN ON (s) AST2BOpBase.estimatedCardinality=424476
          } JOIN ON (s) AST2BOpBase.estimatedCardinality=968730
        }
      

        Activity

        Hide
        michaelschmidt michaelschmidt added a comment -

        Can't reproduce the behavior with 1.5.2 and dummy data. The query

        PREFIX foaf: <http://xmlns.com/foaf/0.1/>
        PREFIX dc: <http://purl.org/dc/elements/1.1/>
        PREFIX x: <http://example.com/>
        
        SELECT (count(*) AS ?c) WHERE {
          ?s a ?t .
          ?s foaf:name|dc:title ?name .
          FILTER (?s = x:x || ?s = x:x)
        }
        

        with FILTER yields 0 results for me on the following dummy data

        <http://s> a <http://xmlns.com/foaf/0.1/Person> .
        <http://s> <http://xmlns.com/foaf/0.1/name> "Michael" .
        <http://s> <http://purl.org/dc/elements/1.1/title> "Some Title" .
        

        , whereas I get two results (as expected) when removing the filter.

        Also the query plan for this dummy data looks fine for me:

        QueryType: SELECT
        includeInferred=true
        SELECT ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(*))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(*)] AS VarNode(c) )
          JoinGroupNode {
            StatementPatternNode(VarNode(s), ConstantNode(Vocab(14)[http://www.w3.org/1999/02/22-rdf-syntax-ns#type]), VarNode(t)) [scope=DEFAULT_CONTEXTS] [#filters=1]
              FILTER( FunctionNode(FunctionNode(VarNode(s),ConstantNode(TermId(0U)[http://example.com/x]))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#equal-to, valueExpr=com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ]],FunctionNode(VarNode(s),ConstantNode(TermId(0U)[http://example.com/x]))[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#equal-to, valueExpr=com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ]])[ FunctionNode.scalarVals=null, FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#logical-or, valueExpr=com.bigdata.rdf.internal.constraints.OrBOp(com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ],com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ])] )
              AST2BOpBase.estimatedCardinality=1
              AST2BOpBase.originalIndex=POS
            PropertyPathUnionNode [joinVars=[s]] [projectInVars=[s]] {
              JoinGroupNode [joinVars=[s]] [projectInVars=[s]] {
                StatementPatternNode(VarNode(s), ConstantNode(Vocab(117)[http://xmlns.com/foaf/0.1/name]), VarNode(name)) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=1
                  AST2BOpBase.originalIndex=POS
              } JOIN ON (s) AST2BOpBase.estimatedCardinality=1
              JoinGroupNode [joinVars=[s]] [projectInVars=[s]] {
                StatementPatternNode(VarNode(s), ConstantNode(Vocab(-73)[http://purl.org/dc/elements/1.1/title]), VarNode(name)) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=1
                  AST2BOpBase.originalIndex=POS
              } JOIN ON (s) AST2BOpBase.estimatedCardinality=1
            } JOIN ON (s) AST2BOpBase.estimatedCardinality=2
          }
        
        

        Note that, as opposed to the query plan reported, in my setting the FILTER is applied early, on top of the triple pattern.

        @Michal Politowski: could you please test my dummy setting? If it succeeds for you as well, this would be a data dependent problem (and, if so, would it be possible to share your data). If not, we need to figure out differences in the configuration. Here are a couple of questions:

        • Are you using the original 1.5.2 release or some maintenance release from SF?
        • Could you provide us with details about your configuration setting (i.e., the properties file including information such as quads vs. triples mode, inferencing turned on/off, etc.)?
        Show
        michaelschmidt michaelschmidt added a comment - Can't reproduce the behavior with 1.5.2 and dummy data. The query PREFIX foaf: <http: //xmlns.com/foaf/0.1/> PREFIX dc: <http: //purl.org/dc/elements/1.1/> PREFIX x: <http: //example.com/> SELECT (count(*) AS ?c) WHERE { ?s a ?t . ?s foaf:name|dc:title ?name . FILTER (?s = x:x || ?s = x:x) } with FILTER yields 0 results for me on the following dummy data <http: //s> a <http://xmlns.com/foaf/0.1/Person> . <http: //s> <http://xmlns.com/foaf/0.1/name> "Michael" . <http: //s> <http://purl.org/dc/elements/1.1/title> "Some Title" . , whereas I get two results (as expected) when removing the filter. Also the query plan for this dummy data looks fine for me: QueryType: SELECT includeInferred= true SELECT ( com.bigdata.rdf.sparql.ast.FunctionNode(VarNode(*))[ FunctionNode.scalarVals= null , FunctionNode.functionURI=http: //www.w3.org/2006/sparql-functions#count, valueExpr=com.bigdata.bop.rdf.aggregate.COUNT(*)] AS VarNode(c) ) JoinGroupNode { StatementPatternNode(VarNode(s), ConstantNode(Vocab(14)[http: //www.w3.org/1999/02/22-rdf-syntax-ns#type]), VarNode(t)) [scope=DEFAULT_CONTEXTS] [#filters=1] FILTER( FunctionNode(FunctionNode(VarNode(s),ConstantNode(TermId(0U)[http: //example.com/x]))[ FunctionNode.scalarVals= null , FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#equal-to, valueExpr=com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ]],FunctionNode(VarNode(s),ConstantNode(TermId(0U)[http://example.com/x]))[ FunctionNode.scalarVals= null , FunctionNode.functionURI=http://www.w3.org/2005/xpath-functions#equal-to, valueExpr=com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ]])[ FunctionNode.scalarVals= null , FunctionNode.functionURI=http://www.w3.org/2006/sparql-functions#logical-or, valueExpr=com.bigdata.rdf.internal.constraints.OrBOp(com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ],com.bigdata.rdf.internal.constraints.CompareBOp(s,TermId(0U)[http://example.com/x])[ CompareBOp.op=EQ])] ) AST2BOpBase.estimatedCardinality=1 AST2BOpBase.originalIndex=POS PropertyPathUnionNode [joinVars=[s]] [projectInVars=[s]] { JoinGroupNode [joinVars=[s]] [projectInVars=[s]] { StatementPatternNode(VarNode(s), ConstantNode(Vocab(117)[http: //xmlns.com/foaf/0.1/name]), VarNode(name)) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=1 AST2BOpBase.originalIndex=POS } JOIN ON (s) AST2BOpBase.estimatedCardinality=1 JoinGroupNode [joinVars=[s]] [projectInVars=[s]] { StatementPatternNode(VarNode(s), ConstantNode(Vocab(-73)[http: //purl.org/dc/elements/1.1/title]), VarNode(name)) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=1 AST2BOpBase.originalIndex=POS } JOIN ON (s) AST2BOpBase.estimatedCardinality=1 } JOIN ON (s) AST2BOpBase.estimatedCardinality=2 } Note that, as opposed to the query plan reported, in my setting the FILTER is applied early, on top of the triple pattern. @Michal Politowski: could you please test my dummy setting? If it succeeds for you as well, this would be a data dependent problem (and, if so, would it be possible to share your data). If not, we need to figure out differences in the configuration. Here are a couple of questions: Are you using the original 1.5.2 release or some maintenance release from SF? Could you provide us with details about your configuration setting (i.e., the properties file including information such as quads vs. triples mode, inferencing turned on/off, etc.)?
        Hide
        mpol Michal Politowski added a comment -

        0. the three-triple dummy example works correctly for me as well, so apparently it is a data (data volume?) dependent problem. I'll try to provide you with some data.
        1. Nonetheless, the setup: it happened with different settings, in particular it is reproducible with the jar and war as currently available on sourceforge and the default kb created when the server is started, so:

        com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor	1024
        com.bigdata.relation.container	kb
        com.bigdata.journal.AbstractJournal.bufferMode	DiskRW
        com.bigdata.journal.AbstractJournal.file	bigdata.jnl
        com.bigdata.journal.AbstractJournal.initialExtent	209715200
        com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass	com.bigdata.rdf.vocab.DefaultBigdataVocabulary
        com.bigdata.rdf.store.AbstractTripleStore.textIndex	false
        com.bigdata.btree.BTree.branchingFactor	128
        com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor	400
        com.bigdata.rdf.store.AbstractTripleStore.axiomsClass	com.bigdata.rdf.axioms.NoAxioms
        com.bigdata.service.AbstractTransactionService.minReleaseAge	1
        com.bigdata.rdf.sail.truthMaintenance	false
        com.bigdata.journal.AbstractJournal.maximumExtent	209715200
        com.bigdata.rdf.sail.namespace	kb
        com.bigdata.relation.class	com.bigdata.rdf.store.LocalTripleStore
        com.bigdata.rdf.store.AbstractTripleStore.quads	true
        com.bigdata.relation.namespace	kb
        com.bigdata.btree.writeRetentionQueue.capacity	4000
        com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers	false
        
        Show
        mpol Michal Politowski added a comment - 0. the three-triple dummy example works correctly for me as well, so apparently it is a data (data volume?) dependent problem. I'll try to provide you with some data. 1. Nonetheless, the setup: it happened with different settings, in particular it is reproducible with the jar and war as currently available on sourceforge and the default kb created when the server is started, so: com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor 1024 com.bigdata.relation.container kb com.bigdata.journal.AbstractJournal.bufferMode DiskRW com.bigdata.journal.AbstractJournal.file bigdata.jnl com.bigdata.journal.AbstractJournal.initialExtent 209715200 com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass com.bigdata.rdf.vocab.DefaultBigdataVocabulary com.bigdata.rdf.store.AbstractTripleStore.textIndex false com.bigdata.btree.BTree.branchingFactor 128 com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor 400 com.bigdata.rdf.store.AbstractTripleStore.axiomsClass com.bigdata.rdf.axioms.NoAxioms com.bigdata.service.AbstractTransactionService.minReleaseAge 1 com.bigdata.rdf.sail.truthMaintenance false com.bigdata.journal.AbstractJournal.maximumExtent 209715200 com.bigdata.rdf.sail.namespace kb com.bigdata.relation.class com.bigdata.rdf.store.LocalTripleStore com.bigdata.rdf.store.AbstractTripleStore.quads true com.bigdata.relation.namespace kb com.bigdata.btree.writeRetentionQueue.capacity 4000 com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers false
        Hide
        mpol Michal Politowski added a comment -

        A small modification of your dummy data does reproduce the problem for me:

        <http://s> a <http://xmlns.com/foaf/0.1/Person> .
        <http://s> <http://purl.org/dc/elements/1.1/title> "Some Title" .
        <http://z> a <http://xmlns.com/foaf/0.1/Person> .
        
        Show
        mpol Michal Politowski added a comment - A small modification of your dummy data does reproduce the problem for me: <http://s> a <http://xmlns.com/foaf/0.1/Person> . <http://s> <http://purl.org/dc/elements/1.1/title> "Some Title" . <http://z> a <http://xmlns.com/foaf/0.1/Person> .
        Hide
        michaelschmidt michaelschmidt added a comment -

        Thenks for the comments, problem is reproducable now. The issue is related to the ASTAttachJoinFiltersOptimizer, which attaches FILTERs to UNION nodes (the property path union node being a special case of a union node here), but UNION node translation simply disgregards these filters. So this is a more general problem, which did not pop up because typically filters are pushed inside UNIONs (which is not possible for the property path, though).

        The fix is quite straightforward: in com.bigdata.rdf.sparql.ast.optimizers.ASTAttachJoinFiltersOptimizer.attachJoinFilters2(IEvaluationContext, StaticAnalysis, JoinGroupNode), just add the following code snippet:

                    if (aJoinNode instanceof UnionNode) {
                    	
                    	/*
                    	 * Note: the translation for union nodes currently does not
                    	 * support inlined filters. This is an edge case anyway, since
                    	 * FILTERs are typically pushed inside UNION nodes (wherever
                    	 * possible).
                    	 * 
                    	 * See https://jira.blazegraph.com/browse/BLZG-1494.
                    	 */
                    	continue;
                    } 
        
        Show
        michaelschmidt michaelschmidt added a comment - Thenks for the comments, problem is reproducable now. The issue is related to the ASTAttachJoinFiltersOptimizer, which attaches FILTERs to UNION nodes (the property path union node being a special case of a union node here), but UNION node translation simply disgregards these filters. So this is a more general problem, which did not pop up because typically filters are pushed inside UNIONs (which is not possible for the property path, though). The fix is quite straightforward: in com.bigdata.rdf.sparql.ast.optimizers.ASTAttachJoinFiltersOptimizer.attachJoinFilters2(IEvaluationContext, StaticAnalysis, JoinGroupNode), just add the following code snippet: if (aJoinNode instanceof UnionNode) { /* * Note: the translation for union nodes currently does not * support inlined filters. This is an edge case anyway, since * FILTERs are typically pushed inside UNION nodes (wherever * possible). * * See https: //jira.blazegraph.com/browse/BLZG-1494. */ continue ; }
        Hide
        michaelschmidt michaelschmidt added a comment -
        Show
        michaelschmidt michaelschmidt added a comment - See pull request at https://github.com/SYSTAP/bigdata/pull/160
        Show
        beebs Brad Bebee added a comment - https://github.com/SYSTAP/bigdata/commit/5d54bf8e277fb40876925c6a8fd4a8efc85f657c
        Hide
        beebs Brad Bebee added a comment -
        Show
        beebs Brad Bebee added a comment - Maven master merge is https://github.com/SYSTAP/bigdata/pull/161 .

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            mpol Michal Politowski
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: