Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-2043

xsd:integer IV not properly resolved when inlining disabled

    Details

    • Type: Bug
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: BLAZEGRAPH_2_1_2
    • Fix Version/s: BLAZEGRAPH_2_2_0
    • Component/s: None
    • Labels:

      Description

      Consider a journal with all inlining disabled and the following query (didn't minimize the query, might be reproducable with simpler queries as well):

      PREFIX :    <http://example/>
      
      SELECT ?a ?y ?d ?z
      {
          ?a :p ?c OPTIONAL { ?a :r ?d }.  
          ?a ?p 1 { ?p a ?y } UNION { ?a ?z ?p } 
      }
      

      The problem is that the integer "1" is not properly resolved, i.e. after the ASTDeferredIVResolutionInitializer has been executed in com.bigdata.rdf.sparql.ast.eval.ASTEvalHelper.optimizeQuery() the IV is represented as an XSDIntegerIV(1), without resolved term ID. Tracing it down, I ended up in l. 1208 of ASTDeferredIVResolution:

      iv = ASTDeferredIVResolutionInitializer.decode(label, dte.name());
      

      , which seems to be where the unresolved IV comes from. We need a test case that reproduces the scenario as well as a fix. We also need to make sure that other builtin-datatypes are supported properly.

      Here are the namespace properties that could be used to reproduce the behavior:

      com.bigdata.namespace.asdasdasd.spo.com.bigdata.btree.BTree.branchingFactor=1024
      com.bigdata.rdf.store.AbstractTripleStore.inlineBNodes=false
      com.bigdata.rdf.store.AbstractTripleStore.inlineDateTimes=false
      com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=com.bigdata.rdf.vocab.NoVocabulary
      com.bigdata.rdf.store.AbstractTripleStore.textIndex=false
      com.bigdata.namespace.asdasdasd.lex.com.bigdata.btree.BTree.branchingFactor=400
      com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
      com.bigdata.rdf.sail.isolatableIndices=false
      com.bigdata.rdf.store.AbstractTripleStore.justify=false
      com.bigdata.rdf.sail.truthMaintenance=false
      com.bigdata.rdf.store.AbstractTripleStore.blobsThreshold=2147483647
      com.bigdata.rdf.sail.namespace=asdasdasd
      com.bigdata.rdf.store.AbstractTripleStore.quads=false
      com.bigdata.rdf.store.AbstractTripleStore.inlineXSDDatatypeLiterals=false
      com.bigdata.rdf.store.AbstractTripleStore.geoSpatial=false
      com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
      

      Note1: if we're substituting the integer by a string (i.e., "1" instead of 1, it works as expected)
      Note2: in my scenario the term (i.e., integer 1) is not present in the dictionary, so it should be resolved to a mock IV, but the same problem seems to show up if the term is in the dictionary - both cases should be covered by the test).

        Issue Links

          Activity

          Hide
          igorkim igorkim added a comment - - edited

          [~], I'm trying to debug this testcase. There are not test data attached to the task, so I'm using small generic data. For N3 file:

          @prefix :    <http://example/> .
          :a :p 1 .
          :p a :Y .
          

          SPARQL query in description successfully returns following bindings set:
          a == <http://example/a>
          y == <http://example/Y>
          d, z - not bound (optional).
          Without triple

          <<:a :p 1 .>>
          

          in the triplestore, query does not return any results, as statement pattern <<?a ?p 1>> does not match.
          Could you, please, provide more details on what is expected behavior and how parsing of literal 1 as XSDIntegerIV(1) does affect actual results on your data?

          Show
          igorkim igorkim added a comment - - edited [~] , I'm trying to debug this testcase. There are not test data attached to the task, so I'm using small generic data. For N3 file: @prefix : <http: //example/> . :a :p 1 . :p a :Y . SPARQL query in description successfully returns following bindings set: a == < http://example/a > y == < http://example/Y > d, z - not bound (optional). Without triple <<:a :p 1 .>> in the triplestore, query does not return any results, as statement pattern <<?a ?p 1>> does not match. Could you, please, provide more details on what is expected behavior and how parsing of literal 1 as XSDIntegerIV(1) does affect actual results on your data?
          Hide
          igorkim igorkim added a comment -

          I've added a testcase for this ticket:
          https://github.com/SYSTAP/bigdata/compare/BLZG-2043?expand=1

          With disabled inlining query evalutaiton successfully resolves 1 as ConstantNode(TermId(4U)) for existing term.

          For not existing term in the namespace configured to not inline literals, conversion to mock TermId(0U) does not occur since version 1.5.2
          (see https://jira.blazegraph.com/browse/BLZG-1176 SPARQL Parsers should not be db mode aware).

          Note, that actual resolution of temporary IV to TermId occur while evaluating query (in ASTDeferredIVResolution), not while running prepare, as ASTDeferredIVResolutionInitializer does not have any access to triplestore and namespace configuration including inlining options since BLZG-1176 (ver. 1.5.2).

          So potential fix to revert behavior of query parsing/evaluation to convert 1 as mock TermId(0U) instead of XSDInteger, is that LexiconRelation while resolving provided IVs against triplestore could check if IV in question is temporary XSDInteger, which is missing from triplestore, and recreate it as mock TermId(0U) if namespace configured to not inline such IV.
          michaelschmidt, thompsonbry, is it correct approach to solve this issue?

          Show
          igorkim igorkim added a comment - I've added a testcase for this ticket: https://github.com/SYSTAP/bigdata/compare/BLZG-2043?expand=1 With disabled inlining query evalutaiton successfully resolves 1 as ConstantNode(TermId(4U)) for existing term. For not existing term in the namespace configured to not inline literals, conversion to mock TermId(0U) does not occur since version 1.5.2 (see https://jira.blazegraph.com/browse/BLZG-1176 SPARQL Parsers should not be db mode aware). Note, that actual resolution of temporary IV to TermId occur while evaluating query (in ASTDeferredIVResolution), not while running prepare, as ASTDeferredIVResolutionInitializer does not have any access to triplestore and namespace configuration including inlining options since BLZG-1176 (ver. 1.5.2). So potential fix to revert behavior of query parsing/evaluation to convert 1 as mock TermId(0U) instead of XSDInteger, is that LexiconRelation while resolving provided IVs against triplestore could check if IV in question is temporary XSDInteger, which is missing from triplestore, and recreate it as mock TermId(0U) if namespace configured to not inline such IV. michaelschmidt , thompsonbry , is it correct approach to solve this issue?
          Hide
          michaelschmidt michaelschmidt added a comment - - edited

          Igor, in response to your questions:

          1.) The problem shows up in a different context: for mapgraph, we have IV inlining disabled and rely on IVs being properly resolved (either to their actual TermId or a mocked one, if not present in the dictionary).

          2.) Yes, I can confirm that I actually was wrong about my claim that that the TermId is not properly resolved in cases when it is available in the data (that's what you are suggesting, right?). So looks like we're only talking about the case where literals are not present in the data.

          3.) Your proposal (using a temporary integer and recreating it as a TermId(0U)) sounds reasonable to me. We should make sure to have this behavior in place for all datatypes, not only xsd:integer. thompsonbry please confirm.

          As a side note to bryanthompson: if, as it looks like right now, it is true that only terms that are not present in the dictionary are not properly resolved, we could also map to 0 (mock ID) whenever there is no TermId given, rather than trowing an exception. Or would you prefer to keep the exception, to make sure things are resolved properly?

          Show
          michaelschmidt michaelschmidt added a comment - - edited Igor, in response to your questions: 1.) The problem shows up in a different context: for mapgraph, we have IV inlining disabled and rely on IVs being properly resolved (either to their actual TermId or a mocked one, if not present in the dictionary). 2.) Yes, I can confirm that I actually was wrong about my claim that that the TermId is not properly resolved in cases when it is available in the data (that's what you are suggesting, right?). So looks like we're only talking about the case where literals are not present in the data. 3.) Your proposal (using a temporary integer and recreating it as a TermId(0U)) sounds reasonable to me. We should make sure to have this behavior in place for all datatypes, not only xsd:integer. thompsonbry please confirm. — As a side note to bryanthompson : if, as it looks like right now, it is true that only terms that are not present in the dictionary are not properly resolved, we could also map to 0 (mock ID) whenever there is no TermId given, rather than trowing an exception. Or would you prefer to keep the exception, to make sure things are resolved properly?
          Hide
          igorkim igorkim added a comment -

          Also, solution might be as simple as using MockIV at line 1208 of ASTDeferredIVResolution, which Michael has mentioned. I'm running full test suite to check if it will have any side effects.

          Show
          igorkim igorkim added a comment - Also, solution might be as simple as using MockIV at line 1208 of ASTDeferredIVResolution, which Michael has mentioned. I'm running full test suite to check if it will have any side effects.

            People

            • Assignee:
              igorkim igorkim
              Reporter:
              michaelschmidt michaelschmidt
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: