Details

      Description

      Support a history mechanism using RDR.

      Capture change events and write them back into the graph using RDR:

      # set property s.p = ?foo?
      # add: <:s> <:p> ?foo? .
      # commit
      
      <:s> <:p> ?foo? .
      << <:s> <:p> ?foo? >> <:added> ?t1?^^xsd:dateTime .
      
      # set property s.p = ?bar? .
      # remove: <:s> <:p> ?foo? .
      # add: <:s> <:p> ?bar? .
      # commit
      
      <:s> <:p> ?foo? . (History)
      <:s> <:p> ?bar? .
      << <:s> <:p> ?foo? >> <:added> ?t1?^^xsd:dateTime .
      << <:s> <:p> ?foo? >> <:removed> ?t2?^^xsd:dateTime .
      << <:s> <:p> ?bar? >> <:added> ?t2?^^xsd:dateTime .
      
      # read history of resource <:s>
      select ?s ?p ?o ?action ?time
      where {
        bind(<< ?s ?p ?o >> as ?sid) .
        hint:Prior hint:history true .
        ?sid ?action ?time .
        values (?s) {
          <:s>
        }
      }
      

      1. Instead of removing statements from the indices, downgrade them from Explicit to a new statement type: StatementEnum.History.

      2. By default, exclude History statements from normal reads on the statement indices.

      3. Add a query hint that allow History statements through.

      4. Add a change log listener that writes change events back into the statement indices using RDR.

        Issue Links

          Activity

          Hide
          beebs Brad Bebee added a comment -

          We have two different history features right now (old and new). Each was developed independently but both work in roughly the same way - Blaze provides the ability to listen to a connection and receive events for edits and commits. We use this listener mechanism to capture change events and log them back into the database. The old implementation used a custom BTree index to write the change events, but this index was never exposed back up through the SPARQL layer.

          The new implementation uses RDF* (RDR) to capture the change events directly back into the three statement indices:

          changeEvent(Statement s, Time t, Action a) -> ( << <s> >> <history:a> "t"^^xsd:dateTime . )

          Doing it this way allows you to access the history via a normal SPARQL query using RDR constructs:

          1. give me the entire history for a specified subject
            select * { << <subject> ?p ?o >> ?action ?time . filter(?action in (<history:added>, <history:removed>)) . }
          1. give me all the change events within a time or date range
            select * { << ?s ?p ?o >> ?action ?time . filter(?action in (<history:added>, <history:removed>)) . filter(?time >= "2015-04-01"^^xsd:dateTime && ?time < "2015-04-02"^^xsd:dateTime) . }

          You can see how this feature could be used to log provenance as well. If a connection belongs to a certain user, all change events resulting from that connection could be correlated to the user.

          Show
          beebs Brad Bebee added a comment - We have two different history features right now (old and new). Each was developed independently but both work in roughly the same way - Blaze provides the ability to listen to a connection and receive events for edits and commits. We use this listener mechanism to capture change events and log them back into the database. The old implementation used a custom BTree index to write the change events, but this index was never exposed back up through the SPARQL layer. The new implementation uses RDF* (RDR) to capture the change events directly back into the three statement indices: changeEvent(Statement s, Time t, Action a) -> ( << <s> >> <history:a> "t"^^xsd:dateTime . ) Doing it this way allows you to access the history via a normal SPARQL query using RDR constructs: give me the entire history for a specified subject select * { << <subject> ?p ?o >> ?action ?time . filter(?action in (<history:added>, <history:removed>)) . } give me all the change events within a time or date range select * { << ?s ?p ?o >> ?action ?time . filter(?action in (<history:added>, <history:removed>)) . filter(?time >= "2015-04-01"^^xsd:dateTime && ?time < "2015-04-02"^^xsd:dateTime) . } You can see how this feature could be used to log provenance as well. If a connection belongs to a certain user, all change events resulting from that connection could be correlated to the user.
          Hide
          beebs Brad Bebee added a comment -

          To enable, the KB must be created in RDR mode and the history class must be set.

          For java:

           props.setProperty(AbstractTripleStore.Options.RDR_HISTORY_CLASS, com.bigdata.rdf.sail.RDRHistory.getClass().getName());
          

          For a properties file:

          com.bigdata.rdf.store.AbstractTripleStore.rdrHistoryClass=com.bigdata.rdf.sail.RDRHistory
          
          Show
          beebs Brad Bebee added a comment - To enable, the KB must be created in RDR mode and the history class must be set. For java: props.setProperty(AbstractTripleStore.Options.RDR_HISTORY_CLASS, com.bigdata.rdf.sail.RDRHistory.getClass().getName()); For a properties file: com.bigdata.rdf.store.AbstractTripleStore.rdrHistoryClass=com.bigdata.rdf.sail.RDRHistory
          Hide
          bryanthompson bryanthompson added a comment -

          OK. There is a problem where this branch has not yet been merged into master due to failures in the TestALPPinTrac773 class. See the pull request [1].

          [1] https://github.com/SYSTAP/bigdata/pull/62 pull request

          Show
          bryanthompson bryanthompson added a comment - OK. There is a problem where this branch has not yet been merged into master due to failures in the TestALPPinTrac773 class. See the pull request [1] . [1] https://github.com/SYSTAP/bigdata/pull/62 pull request
          Hide
          beebs Brad Bebee added a comment -

          CI on the old server is https://ci.bigdata.com/job/BLZG-1257/.

          Review com.bigdata.rdf.sparql.ast.optimizers.TestALPPinTrac773 failures to determine if there is an issue.

          Show
          beebs Brad Bebee added a comment - CI on the old server is https://ci.bigdata.com/job/BLZG-1257/ . Review com.bigdata.rdf.sparql.ast.optimizers.TestALPPinTrac773 failures to determine if there is an issue.
          Hide
          bryanthompson bryanthompson added a comment -

          The failing test suite was originally written by @jeremycarroll in support of BLZG-858

          Show
          bryanthompson bryanthompson added a comment - The failing test suite was originally written by @jeremycarroll in support of BLZG-858
          Hide
          bryanthompson bryanthompson added a comment -

          Optimizations were made in support of BLZG-1061 (incremental property path eviction). The history branch on github has other optimizations made in support of BLZG-1257. When the master is merged into the history branch we get test failures in TestALPPinTrac773.

          Show
          bryanthompson bryanthompson added a comment - Optimizations were made in support of BLZG-1061 (incremental property path eviction). The history branch on github has other optimizations made in support of BLZG-1257 . When the master is merged into the history branch we get test failures in TestALPPinTrac773.

            People

            • Assignee:
              michaelschmidt michaelschmidt
              Reporter:
              mikepersonick mikepersonick
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: