I have implemented a history service. It can be enabled through AbstractTripleStore.Options.HISTORY_SERVICE. There is also an option to prune the head of the history index maintained by that service. It indexes a revision time, the (s,p,o[,c]), and the associated ChangeAction (INSERTED, REMOVED, or UPDATED), and the various metadata bits that are associated with an ISPO in the indices (statement type, and some flags).
I have not done an integration with SPARQL yet through the SERVICE keyword. I am not convinced yet whether this facility would be used from SPARQL or from code. We should talk about this. You can use code similar to that found in the HistoryServiceFactory class to access the data in the history index from code. Integrating this into SPARQL is more like making it a feature rather than a demonstration concept / prototype.
The revision time for the entries in the history index is currently lastCommitTime+1 (for an unisolated connection). The issue with revision time is that we have to record things in the history index incrementally, so it can not be a commit time because we do not have that yet. lastCommitTime+1 will always be strictly greater than the previous commit point. When you scan the history index, you can then use fromKey=firstCommitTime to visit (or null for the head of the index). toKey=firstCommitTime to be excluded (or null for the tail of the index). That all has semantics that are pretty much what people would expect for the scan. However, the reported revision times are not going to correspond to the commit points. When reporting this data, we could resolve (and cache) the first commit time greater than that commit point, but only if we still have access to that commit point (the resolution would be against the commit record index, and commit points are pruned from that index when they are recycled).
The revision time for a full read/write tx is less well defined. We can use the same value (lastCommitTime+1). However, it is a little more ambiguous in this case because you can have concurrent transactions so there could be multiple transactions that wind up with the same revisionTime in the history index.
We could also use the actual timestamp when we touched the index for a given ISPO, but there are issues with that as well. For example, on a long running data load we would have a bunch of different revision times for the same commit point and the actual order of those revision times within the index is not material since all changes are associated with the same commit point.
- Refactored IChangeRecord.ChangeAction into its own file.
- Added transactionBegin() and transactionPrepare() methods to IChangeLog.
- Integrated a history index into SPORelation.
- Added a HistoryServiceFactory that will index the data reported by
- Added option to enable the history service to AbstractTripleStore.
- Added option to specify the min release time for the history index
(defaults to infinite).
- Added SPOKeyOrder.appendKey(...) to append the key component for an
ISPO without invoking reset() on the IKeyBuilder or extracting and
returning the byte key. This makes it possible to reuse the
SPOKeyOrder for the history index.
- Added hashCode() to ChangeRecord (based on the ISPO hashCode).
- Wrote unit tests of the history index.
- TODO This feature is just a prototype right now. We have to work
through the use cases (including the SPARQL SERVICE use cases) and
read/write tx support before we can support this. Once supported,
document this on wiki (HISTORY_SERVICE,
HISTORY_SERVICE_MIN_RELEASE_TIME, how to access from code, how to
access from query).
- TODO Unit tests of the history index at the SPARQL layer. This
requires a SERVICE translation. The <<>> syntax might be a nice way
to express access to the statements in the history index.
See http://sourceforge.net/apps/trac/bigdata/ticket/607 (History Service)
Committed revision r6640.