A problem has been observed where the 2nd follower in an HA3 cluster is not participating in commits (it is in SeekConsensus and stuck at commit point 84 while the rest of the cluster has moved on) but still considers itself to be HAReady and a Follower.
The root cause will be related to how we manage the transition from RunMet to a service leave in preparation for SeekConsensus. We need to develop test coverage around these abnormal transitions. That test coverage can be provided by overriding the HAGlue RMI interface under test suite control. A means to write these tests has already been implemented in the TestHAJournalServerOverrides test suite.
An attempt to restart the 2nd follower (bigdata15) caused updates to fail on the leader:
It also caused the status page to not paint correctly on the leader.
And the following trace appears on the 2nd follower after 2 restarts:
This problem repeats and does not resolve.