HAJournalServer can not restart (after a power loss) due to multiple empty log files on bigdata15. bigdata15 has all log files, but has not applied them all. Evidently they were received and written, but the closing root block was not always applied and bigdata15 remained behind.
Worse, bigdata17 continued to make commits even after bigdata16 dropped out. Presumably this is because bigdata15 did not do a
service leave when it was unable to apply a live replicated write. This problem needs to be isolated and resolved through additional unit tests.
The failure to have closing root blocks on some log files may be related to the logRootBlock() code. If bigdata15 somehow remained as a "joined" service and appeared to be participating in commits, but never the less was not writing onto the backing store and occasionally failed to log the root block for an HALog file, then that would explain the observed problem.
I was able to recover by copying the missing logs from bigdata17. On restart, bigdata15 then applied all local logs in sequence and advanced its commit point until it could join a met quorum with bigdata17.