Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-771

HAJournalServer can not restart due to logically empty log files

    XMLWordPrintable

    Details

      Description

      HAJournalServer can not restart (after a power loss) due to multiple empty log files on bigdata15. bigdata15 has all log files, but has not applied them all. Evidently they were received and written, but the closing root block was not always applied and bigdata15 remained behind.

      Worse, bigdata17 continued to make commits even after bigdata16 dropped out. Presumably this is because bigdata15 did not do a
      service leave when it was unable to apply a live replicated write. This problem needs to be isolated and resolved through additional unit tests.

      The failure to have closing root blocks on some log files may be related to the logRootBlock() code. If bigdata15 somehow remained as a "joined" service and appeared to be participating in commits, but never the less was not writing onto the backing store and occasionally failed to log the root block for an HALog file, then that would explain the observed problem.

          Caused by: java.io.IOException: Logically empty HALog: benchmark/HAJournal-1/HAJournalServer/HALog/000/000/000/000/000/298/000000000000000298836.ha-log
      	at com.bigdata.journal.jini.ha.HALogNexus.addHALog(HALogNexus.java:410)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:292)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:286)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:286)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:286)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:286)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:286)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:286)
      	at com.bigdata.journal.jini.ha.HALogNexus.populateIndexRecursive(HALogNexus.java:286)
      	at com.bigdata.journal.jini.ha.HALogNexus.<init>(HALogNexus.java:238)
      	at com.bigdata.journal.jini.ha.HAJournal.<init>(HAJournal.java:365)
      	at com.bigdata.journal.jini.ha.HAJournal.<init>(HAJournal.java:293)
      	... 10 more
      

      I was able to recover by copying the missing logs from bigdata17. On restart, bigdata15 then applied all local logs in sequence and advanced its commit point until it could join a met quorum with bigdata17.

        Attachments

          Activity

            People

            Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: