Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1232 RDR metaticket
  3. BLZG-1233

StatementBuffer problems when loading in SIDs mode

    XMLWordPrintable

    Details

    • Type: Sub-task
    • Status: Accepted
    • Resolution: Unresolved
    • Affects Version/s: BLAZEGRAPH_RELEASE_1_5_1
    • Fix Version/s: None
    • Component/s: Bigdata RDF Database
    • Labels:
      None

      Description

      There are actually two problems:

      1.) The field values is initialized as

      > values = new BigdataValue[* arity + 5|capacity];

      However, in SIDs mode the number of values might be higher than the arity, this should be (arity + 1) for SIDs mode to be on the safe side.

      2.) When loading RDF Reification data, a blank node is constructed for the ?reified? subject position. The handleStatement() method looks for rdf:subject, rdf:predicate, and rdf:object triples and stores them in the blank node as they show up. Now look at the following example:

      :SAP :bought :sybase .
      _:s1 rdf:object :sybase .
      _:s3 dc:created "2013-04-05T12:00:00Z"^^xsd:dateTime .
      :s :just :noise1 .
      _:s2 rdf:object :Youtube .
      _:s1 rdf:type rdf:Statement .
      _:s3 rdf:predicate :bought .
      _:s1 dc:source news:us-sybase .
      :s :just :noise2 .
      _:s2 rdf:predicate :bought .
      :Apple :bought :Siri .
      _:s3 rdf:subject :Apple .
      _:s2 dc:source news:us-sybase .
      :s :just :noise3 .
      :Google :bought :Youtube .
      _:s2 rdf:subject :Google .
      _:s1 rdf:subject :SAP .
      _:s3 dc:source news:us-sybase .
      <SNIP>
      _:s1 rdf:predicate :bought .
      _:s3 rdf:type rdf:Statement .
      :s :just :noise4 .
      _:s1 dc:created "2011-04-05T12:00:00Z"^^xsd:dateTime .
      _:s2 rdf:type rdf:Statement .
      _:s3 rdf:object :Siri .
      _:s2 dc:created "2013-04-05T12:00:00Z"^^xsd:dateTime .

      Assume at the position marked <SNIP> we flush the statement buffer. At this point, the blank nodes for _:s1 and _:s3 are not yet complete in the sense that the reified statement behind them has not yet been read. Thus, at the time when writing the statement into the database, the sid field of the bnode is not yet set. This results in wrong results.

      Not sure how to best fix this. It?s tricky, because we never know when we encounter the rdf:subject/predicate/object triples. What we?d need to do is investigate the buffer and flush only those triples that are complete. However, this might result in a buffer overflow when too many incomplete triples have been gathered. Any ideas?

        Attachments

          Activity

            People

            Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            michaelschmidt michaelschmidt
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated: