Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1232 RDR metaticket
  3. BLZG-1233

StatementBuffer problems when loading in SIDs mode



    • Type: Sub-task
    • Status: Accepted
    • Resolution: Unresolved
    • Affects Version/s: BLAZEGRAPH_RELEASE_1_5_1
    • Fix Version/s: None
    • Component/s: Bigdata RDF Database
    • Labels:


      There are actually two problems:

      1.) The field values is initialized as

      > values = new BigdataValue[* arity + 5|capacity];

      However, in SIDs mode the number of values might be higher than the arity, this should be (arity + 1) for SIDs mode to be on the safe side.

      2.) When loading RDF Reification data, a blank node is constructed for the ?reified? subject position. The handleStatement() method looks for rdf:subject, rdf:predicate, and rdf:object triples and stores them in the blank node as they show up. Now look at the following example:

      :SAP :bought :sybase .
      _:s1 rdf:object :sybase .
      _:s3 dc:created "2013-04-05T12:00:00Z"^^xsd:dateTime .
      :s :just :noise1 .
      _:s2 rdf:object :Youtube .
      _:s1 rdf:type rdf:Statement .
      _:s3 rdf:predicate :bought .
      _:s1 dc:source news:us-sybase .
      :s :just :noise2 .
      _:s2 rdf:predicate :bought .
      :Apple :bought :Siri .
      _:s3 rdf:subject :Apple .
      _:s2 dc:source news:us-sybase .
      :s :just :noise3 .
      :Google :bought :Youtube .
      _:s2 rdf:subject :Google .
      _:s1 rdf:subject :SAP .
      _:s3 dc:source news:us-sybase .
      _:s1 rdf:predicate :bought .
      _:s3 rdf:type rdf:Statement .
      :s :just :noise4 .
      _:s1 dc:created "2011-04-05T12:00:00Z"^^xsd:dateTime .
      _:s2 rdf:type rdf:Statement .
      _:s3 rdf:object :Siri .
      _:s2 dc:created "2013-04-05T12:00:00Z"^^xsd:dateTime .

      Assume at the position marked <SNIP> we flush the statement buffer. At this point, the blank nodes for _:s1 and _:s3 are not yet complete in the sense that the reified statement behind them has not yet been read. Thus, at the time when writing the statement into the database, the sid field of the bnode is not yet set. This results in wrong results.

      Not sure how to best fix this. It?s tricky, because we never know when we encounter the rdf:subject/predicate/object triples. What we?d need to do is investigate the buffer and flush only those triples that are complete. However, this might result in a buffer overflow when too many incomplete triples have been gathered. Any ideas?




            michaelschmidt michaelschmidt
            michaelschmidt michaelschmidt
            0 Vote for this issue
            0 Start watching this issue