Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-663

SPARQL INSERT not working in same request after INSERT DATA

    Details

      Description

      INSERT is not working when attempted immediately after INSERT DATA in the same request. However, it works if it is done in a separate request. Here are details . . .

      When I do Update BLZG-208 (below) and then
      do Update BLZG-209, I get no results from my query. But if I then do Update
      BLZG-209 again, I do get the expected results. However, Update BLZG-209 should have
      been idempotent.

      It is kind of like the "INSERT DATA" statement is not being committed
      until after Update BLZG-209 has finished, so the "INSERT" statement that
      follows it is not seeing the data that "INSERT DATA" put in.

      Here's Update BLZG-208:

      { # Update 1: DROP SILENT GRAPH <http://example/in> ; DROP SILENT GRAPH <http://example/out> ; } Here's Update BLZG-209: { # Update 2: PREFIX foaf: <http://xmlns.com/foaf/0.1/> INSERT DATA \{ bq. GRAPH <http://example/in> \{ <http://example/president25> foaf:givenName "William" . }

      }
      ;

      INSERT {

      GRAPH <http://example/out> {

      ?s ?p ?v .
      }
      }
      WHERE {

      GRAPH <http://example/in> {

      ?s ?p ?v .
      }

      }

      ;
      }

      Here's the query:

      { # Query: SELECT * \{ bq. GRAPH <http://example/out> \{ ?s ?p ?v . }

      }
      LIMIT 10
      }

      I've attached them also, along with my RWStore.properties .

        Activity

        Hide
        dbooth-boston dbooth-boston added a comment -

        I don't know how to determine what version of bigdata I am running. I downloaded it on 2012-05-14.

        Show
        dbooth-boston dbooth-boston added a comment - I don't know how to determine what version of bigdata I am running. I downloaded it on 2012-05-14.
        Hide
        bryanthompson bryanthompson added a comment -

        I suspect that the WHERE clause inside of the 2nd update operation in the request may be reading from the ground state of the transaction (before the writes) and thus not observing the write set from the first insert.

        One workaround while we look into this problem is to issue the update operations separately rather than combining them into a sequence within a single update request.

        Show
        bryanthompson bryanthompson added a comment - I suspect that the WHERE clause inside of the 2nd update operation in the request may be reading from the ground state of the transaction (before the writes) and thus not observing the write set from the first insert. One workaround while we look into this problem is to issue the update operations separately rather than combining them into a sequence within a single update request.
        Hide
        bryanthompson bryanthompson added a comment -

        I put together a unit test along the lines described above and was able to replicate the issue. There are two different underlying problems:

        1. If the assertion/retraction buffers are dirty, then we need to flush them before we run the next operation in this sequence. This is normally handled by the BigdataSail. However, the UPDATE code paths includes some logic which goes directly to ASTEvalHelper to evaluate the WHERE clause of "DeleteInsert" operations and therefore was bypassing the flush() of the assertion and retraction buffers. This can be fixed by explicitly flushing those buffers after each update operation. It could also be fixed by going in through the BigdataSailConnection's method to evaluate a query.

        2. If an RDF Value was not defined when the UPDATE operation sequence was parsed, then no attempt was made to see if it had become defined by subsequent update operations in the same request. This was resulting in a termId of 0L for the "in" graph in the WHERE clause of the 2nd update operation. Since that marks an unknown term, the range count was zero and the access path was empty.

        Due to the second problem, simply issuing a commit after each update operation in a sequence is not sufficient (or desirable). What we need to do is re-resolve any term identifiers which have a 0L before we run any operation after the first in a sequence of updates. The potential for this problem had been noted in the Bigdata2ASTSPARQLParser as a TODO.

        The flush() is easy enough to do. However, I need to write a bit of code to do the batch resolution of the unknown terms in updates.

        Show
        bryanthompson bryanthompson added a comment - I put together a unit test along the lines described above and was able to replicate the issue. There are two different underlying problems: 1. If the assertion/retraction buffers are dirty, then we need to flush them before we run the next operation in this sequence. This is normally handled by the BigdataSail. However, the UPDATE code paths includes some logic which goes directly to ASTEvalHelper to evaluate the WHERE clause of "DeleteInsert" operations and therefore was bypassing the flush() of the assertion and retraction buffers. This can be fixed by explicitly flushing those buffers after each update operation. It could also be fixed by going in through the BigdataSailConnection's method to evaluate a query. 2. If an RDF Value was not defined when the UPDATE operation sequence was parsed, then no attempt was made to see if it had become defined by subsequent update operations in the same request. This was resulting in a termId of 0L for the "in" graph in the WHERE clause of the 2nd update operation. Since that marks an unknown term, the range count was zero and the access path was empty. Due to the second problem, simply issuing a commit after each update operation in a sequence is not sufficient (or desirable). What we need to do is re-resolve any term identifiers which have a 0L before we run any operation after the first in a sequence of updates. The potential for this problem had been noted in the Bigdata2ASTSPARQLParser as a TODO. The flush() is easy enough to do. However, I need to write a bit of code to do the batch resolution of the unknown terms in updates.
        Hide
        bryanthompson bryanthompson added a comment -

        Partial resolution. This fixes the problem when the updates are performed using the unisolated connection. There is a still a problem when they are performed in a full read/write tx. I will look into that next.

        This commit includes a unit test which replicates the problem (against both the unisolated connection and a read/write tx), it includes a new AST optimizer to handle resolution of unknown terms before evaluating an updte, a test suite for that optimizer, and a patch to AST2BOpUpdate to correctly (a) flush the sail assertion and retraction buffers; and (b) perform additional terms resolution steps.

        Committed Revision r6323.

        Show
        bryanthompson bryanthompson added a comment - Partial resolution. This fixes the problem when the updates are performed using the unisolated connection. There is a still a problem when they are performed in a full read/write tx. I will look into that next. This commit includes a unit test which replicates the problem (against both the unisolated connection and a read/write tx), it includes a new AST optimizer to handle resolution of unknown terms before evaluating an updte, a test suite for that optimizer, and a patch to AST2BOpUpdate to correctly (a) flush the sail assertion and retraction buffers; and (b) perform additional terms resolution steps. Committed Revision r6323.
        Hide
        bryanthompson bryanthompson added a comment -

        Fix for the transaction isolation case. There were a few code paths in the class which handles SPARQL update which were using the unisolated view of the database to run the queries rather than the view isolated by the transaction.

        At this point, this issue appears to be resolved.

        Committed revision r6324.

        Show
        bryanthompson bryanthompson added a comment - Fix for the transaction isolation case. There were a few code paths in the class which handles SPARQL update which were using the unisolated view of the database to run the queries rather than the view isolated by the transaction. At this point, this issue appears to be resolved. Committed revision r6324.
        Hide
        bryanthompson bryanthompson added a comment -

        CI is clean.

        Show
        bryanthompson bryanthompson added a comment - CI is clean.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            dbooth-boston dbooth-boston
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: