The code is designed to detect a situation in which a timestamps do not move forward during the commit protocol. You can see the logic in Journal.BarrierState and Journal.InnerJournalTransactionService.
The leader takes a timestamp from its timestamp factory at the start of the consensus protocol. This protocol coordinates an agreement among the services about the earliest commit point that will remain visible due to (a) the minReleaseAge; and (b) the earliestActiveTx on each service. The event sequence looks like this:
- Leader: takes timestamp then issue "gather" request to followers.
- Follower: notes its earliest visible commit point based on the earliest active transaction and the minReleaseAge. Takes a timestamp. Calls back to the leader with those data.
- Leader: once all responses are received, takes 2nd timestamp.
The leader then asserts that the first timestamp is before the timestamp on the followers. It also asserts that the timestamp on each follower is before the leader's second timestamp. If these assertions fail (by more than 5s as presently configured) then the com.bigdata.util.ClocksNotSynchronizedException is thrown.
I think that the problem is in this method in AbstractJournal. As you can see, it is using the absolute value of the delta between the two timestamps. Thus, it will fail not only where there is clock skew (specifically, not only when t2 is before t1), but also where there is significant latency (e.g., due to a major GC pause on one service).
This method on AbstractJournal is providing the maximum allowed clock skew. We have not yet raised this as a configuration parameter, but we will do so. In the moment, you could simply override the return value.