Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-595

PhysicalAddressResolutionException after reopen using RWStore and recycler

    Details

      Description

      An exception has been reported against 1.0.5 after the re-open of a Journal with using minReleaseAge=1 (the recycler versus session protection).

      The exception is reported after a journal shutdown by the first attempt to run queries against the store after a read-open. The report indicates that the queries were being run against a read-only connection.

      Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unable to read data: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -148801
              at com.bigdata.rdf.lexicon.LexiconRelation._addTerms(LexiconRelation.java:1264)
              at com.bigdata.rdf.lexicon.LexiconRelation.addTerms(LexiconRelation.java:1218)
              at com.bigdata.rdf.sail.BigdataValueReplacer.replaceValues(BigdataValueReplacer.java:211)
              at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.replaceValues(BigdataSail.java:3576)
              at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.evaluate2(BigdataSail.java:3500)
              at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.evaluate(BigdataSail.java:3263)
              at com.bigdata.rdf.sail.BigdataSailTupleQuery.evaluate(BigdataSailTupleQuery.java:68)
      
      ????
      
      Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Unable to read data: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -148801
              at com.bigdata.btree.BTree.load(BTree.java:1818)
              at com.bigdata.journal.AbstractJournal.getIndex(AbstractJournal.java:3577)
              at com.bigdata.journal.AbstractJournal.getIndex(AbstractJournal.java:3506)
              at com.bigdata.journal.Journal.getIndexSources(Journal.java:681)
              at com.bigdata.journal.Journal.getIndex(Journal.java:895)
              at com.bigdata.journal.Journal.getIndex(Journal.java:99)
              at com.bigdata.relation.AbstractRelation.getIndex(AbstractRelation.java:205)
              at com.bigdata.relation.AbstractRelation.getIndex(AbstractRelation.java:165)
              at com.bigdata.relation.AbstractRelation.getIndex(AbstractRelation.java:133)
              at com.bigdata.rdf.lexicon.LexiconRelation.getTerm2IdIndex(LexiconRelation.java:753)
              at com.bigdata.rdf.lexicon.LexiconRelation.getIndex(LexiconRelation.java:714)
              at com.bigdata.rdf.lexicon.Term2IdWriteTask.call(Term2IdWriteTask.java:98)
              at com.bigdata.rdf.lexicon.LexiconRelation._addTerms(LexiconRelation.java:1261)
              ... 10 more
      Caused by: java.lang.IllegalArgumentException: Unable to read data: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -148801
              at com.bigdata.rwstore.RWStore.getData(RWStore.java:1487)
              at com.bigdata.journal.RWStrategy.readFromLocalStore(RWStrategy.java:624)
              at com.bigdata.journal.RWStrategy.read(RWStrategy.java:135)
              at com.bigdata.journal.AbstractJournal.read(AbstractJournal.java:2729)
              at com.bigdata.btree.AbstractBTree.readNodeOrLeaf(AbstractBTree.java:3824)
              at com.bigdata.btree.BTree._reopen(BTree.java:504)
              at com.bigdata.btree.AbstractBTree.reopen(AbstractBTree.java:1251)
              at com.bigdata.btree.BTree.load(BTree.java:1812)
              ... 22 more
      Caused by: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -148801
              at com.bigdata.rwstore.RWStore.getData(RWStore.java:1392)
              ... 29 more
      

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        The exception is suggestive of a recycler or session protection error. For example, the PhysicalAddressResolutionException was observed in [1]. We have not seen any exceptions related to this in CI since we closed BLZG-566 [1] and BLZG-589 [2].

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/440
        [2] https://sourceforge.net/apps/trac/bigdata/ticket/467

        Show
        bryanthompson bryanthompson added a comment - The exception is suggestive of a recycler or session protection error. For example, the PhysicalAddressResolutionException was observed in [1] . We have not seen any exceptions related to this in CI since we closed BLZG-566 [1] and BLZG-589 [2] . [1] https://sourceforge.net/apps/trac/bigdata/ticket/440 [2] https://sourceforge.net/apps/trac/bigdata/ticket/467
        Hide
        bryanthompson bryanthompson added a comment -

        We have created a variant of the CI stress test for this condition in which a re-open is performed. We have been unable to replicate the reported problem with that test. The customer indicates that the problem is reproducible on their end. We have requested a txLog trace for the data load, reopen, and query up to the error.

        Show
        bryanthompson bryanthompson added a comment - We have created a variant of the CI stress test for this condition in which a re-open is performed. We have been unable to replicate the reported problem with that test. The customer indicates that the problem is reproducible on their end. We have requested a txLog trace for the data load, reopen, and query up to the error.
        Hide
        bryanthompson bryanthompson added a comment -

        Customer states:

        - The problem seems not related to shutdown. If we just start the service creating a
        fresh jnl and let it run for a while, the jnl seems eventually getting into that state.
        
        - Session protection did not seem help, it may take longer time, but the same issue
        did show up.
        

        It is interesting that the problem shows up under session protection as well. That pretty much rules out the transaction service, which was at fault in BLZG-589.

        Show
        bryanthompson bryanthompson added a comment - Customer states: - The problem seems not related to shutdown. If we just start the service creating a fresh jnl and let it run for a while, the jnl seems eventually getting into that state. - Session protection did not seem help, it may take longer time, but the same issue did show up. It is interesting that the problem shows up under session protection as well. That pretty much rules out the transaction service, which was at fault in BLZG-589 .
        Hide
        bryanthompson bryanthompson added a comment -

        Based on a log file provided by the customer, the error is NOT on the first read after the re-open. In fact, there have been a number of commits and reads before the error appears.

        Also, the customer reports being able to hit this exception without a re-open of the journal.

        Show
        bryanthompson bryanthompson added a comment - Based on a log file provided by the customer, the error is NOT on the first read after the re-open. In fact, there have been a number of commits and reads before the error appears. Also, the customer reports being able to hit this exception without a re-open of the journal.
        Hide
        bryanthompson bryanthompson added a comment -

        Customer wrote: After disabling the second component, I am getting a different exception on restarting the service, but this time there seems only one thread (Thread-0): The exception happened when committing ontologies into the existing jnl (freeing bit not set) on service start up. (txLog was provided.)

        Error loading ontology: Could not commit index: name=...kb.lex.TERM2ID, commitList=[DirtyListener{name=...kb.lex.TERM2ID,needsCheckpoint}, DirtyListener{name=....kb.spo.OSP,needsCheckpoint}, DirtyListener{name=...kb.spo.POS,needsCheckpoint}, DirtyListener{name=...kb.spo.SPO,needsCheckpoint}]: lastRootBlock=rootBlock{ rootBlock=1, challisField=13, version=2, nextOffset=618475298842, localTime=1328560578367 [Monday, February 6, 2012 3:36:18 PM EST], firstCommitTime=1328560332607 [Monday, February 6, 2012 3:32:12 PM EST], lastCommitTime=1328560578351 [Monday, February 6, 2012 3:36:18 PM EST], commitCounter=13, commitRecordAddr={off=NATIVE:-40993,len=422}, commitRecordIndexAddr={off=NATIVE:-8202,len=220}, quorumToken=0, metaBitsAddr=56316919858, metaStartAddr=160, storeType=RW, uuid=510ec5df-3ea9-43c5-ac3c-0d41dcaeb025, offsetBits=42, checksum=-1105322746, createTime=1328560331913 [Monday, February 6, 2012 3:32:11 PM EST], closeTime=0}
                    at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:2278) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:74) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:2745) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:260) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit(BigdataSailRepositoryConnection.java:279) ~[bigdata-1.0.5.jar:na]
                    ... 25 common frames omitted
        Caused by: java.lang.RuntimeException: Could not commit index: name=...kb.lex.TERM2ID, commitList=[DirtyListener{name=...kb.lex.TERM2ID,needsCheckpoint}, DirtyListener{name=...kb.spo.OSP,needsCheckpoint}, DirtyListener{name=....kb.spo.POS,needsCheckpoint}, DirtyListener{name=...kb.spo.SPO,needsCheckpoint}]
                    at com.bigdata.journal.Name2Addr.handleCommit(Name2Addr.java:571) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.journal.AbstractJournal.notifyCommitters(AbstractJournal.java:2001) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:2364) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:2276) ~[bigdata-1.0.5.jar:na]
                    ... 30 common frames omitted
        Caused by: java.lang.IllegalArgumentException: IAE with address: -90115, size: 745, context: -1
                    at com.bigdata.rwstore.FixedAllocator.free(FixedAllocator.java:685) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:1859) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:1830) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rwstore.RWStore.free(RWStore.java:1685) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:267) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:245) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.journal.AbstractJournal.delete(AbstractJournal.java:2788) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.btree.AbstractBTree.recycle(AbstractBTree.java:4226) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.btree.BTree._writeCheckpoint2(BTree.java:1013) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.btree.BTree.writeCheckpoint2(BTree.java:904) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.btree.BTree.handleCommit(BTree.java:1248) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.journal.Name2Addr.handleCommit(Name2Addr.java:563) ~[bigdata-1.0.5.jar:na]
                    ... 33 common frames omitted
        Caused by: java.lang.IllegalArgumentException: Freeing bit not set
                    at com.bigdata.rwstore.AllocBlock.freeBit(AllocBlock.java:147) ~[bigdata-1.0.5.jar:na]
                    at com.bigdata.rwstore.FixedAllocator.free(FixedAllocator.java:652) ~[bigdata-1.0.5.jar:na]
                    ... 44 common frames omitted
        
        Show
        bryanthompson bryanthompson added a comment - Customer wrote: After disabling the second component, I am getting a different exception on restarting the service, but this time there seems only one thread (Thread-0): The exception happened when committing ontologies into the existing jnl (freeing bit not set) on service start up. (txLog was provided.) Error loading ontology: Could not commit index: name=...kb.lex.TERM2ID, commitList=[DirtyListener{name=...kb.lex.TERM2ID,needsCheckpoint}, DirtyListener{name=....kb.spo.OSP,needsCheckpoint}, DirtyListener{name=...kb.spo.POS,needsCheckpoint}, DirtyListener{name=...kb.spo.SPO,needsCheckpoint}]: lastRootBlock=rootBlock{ rootBlock=1, challisField=13, version=2, nextOffset=618475298842, localTime=1328560578367 [Monday, February 6, 2012 3:36:18 PM EST], firstCommitTime=1328560332607 [Monday, February 6, 2012 3:32:12 PM EST], lastCommitTime=1328560578351 [Monday, February 6, 2012 3:36:18 PM EST], commitCounter=13, commitRecordAddr={off=NATIVE:-40993,len=422}, commitRecordIndexAddr={off=NATIVE:-8202,len=220}, quorumToken=0, metaBitsAddr=56316919858, metaStartAddr=160, storeType=RW, uuid=510ec5df-3ea9-43c5-ac3c-0d41dcaeb025, offsetBits=42, checksum=-1105322746, createTime=1328560331913 [Monday, February 6, 2012 3:32:11 PM EST], closeTime=0} at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:2278) ~[bigdata-1.0.5.jar:na] at com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:74) ~[bigdata-1.0.5.jar:na] at com.bigdata.rdf.sail.BigdataSail$BigdataSailConnection.commit2(BigdataSail.java:2745) ~[bigdata-1.0.5.jar:na] at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit2(BigdataSailRepositoryConnection.java:260) ~[bigdata-1.0.5.jar:na] at com.bigdata.rdf.sail.BigdataSailRepositoryConnection.commit(BigdataSailRepositoryConnection.java:279) ~[bigdata-1.0.5.jar:na] ... 25 common frames omitted Caused by: java.lang.RuntimeException: Could not commit index: name=...kb.lex.TERM2ID, commitList=[DirtyListener{name=...kb.lex.TERM2ID,needsCheckpoint}, DirtyListener{name=...kb.spo.OSP,needsCheckpoint}, DirtyListener{name=....kb.spo.POS,needsCheckpoint}, DirtyListener{name=...kb.spo.SPO,needsCheckpoint}] at com.bigdata.journal.Name2Addr.handleCommit(Name2Addr.java:571) ~[bigdata-1.0.5.jar:na] at com.bigdata.journal.AbstractJournal.notifyCommitters(AbstractJournal.java:2001) ~[bigdata-1.0.5.jar:na] at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:2364) ~[bigdata-1.0.5.jar:na] at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:2276) ~[bigdata-1.0.5.jar:na] ... 30 common frames omitted Caused by: java.lang.IllegalArgumentException: IAE with address: -90115, size: 745, context: -1 at com.bigdata.rwstore.FixedAllocator.free(FixedAllocator.java:685) ~[bigdata-1.0.5.jar:na] at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:1859) ~[bigdata-1.0.5.jar:na] at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:1830) ~[bigdata-1.0.5.jar:na] at com.bigdata.rwstore.RWStore.free(RWStore.java:1685) ~[bigdata-1.0.5.jar:na] at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:267) ~[bigdata-1.0.5.jar:na] at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:245) ~[bigdata-1.0.5.jar:na] at com.bigdata.journal.AbstractJournal.delete(AbstractJournal.java:2788) ~[bigdata-1.0.5.jar:na] at com.bigdata.btree.AbstractBTree.recycle(AbstractBTree.java:4226) ~[bigdata-1.0.5.jar:na] at com.bigdata.btree.BTree._writeCheckpoint2(BTree.java:1013) ~[bigdata-1.0.5.jar:na] at com.bigdata.btree.BTree.writeCheckpoint2(BTree.java:904) ~[bigdata-1.0.5.jar:na] at com.bigdata.btree.BTree.handleCommit(BTree.java:1248) ~[bigdata-1.0.5.jar:na] at com.bigdata.journal.Name2Addr.handleCommit(Name2Addr.java:563) ~[bigdata-1.0.5.jar:na] ... 33 common frames omitted Caused by: java.lang.IllegalArgumentException: Freeing bit not set at com.bigdata.rwstore.AllocBlock.freeBit(AllocBlock.java:147) ~[bigdata-1.0.5.jar:na] at com.bigdata.rwstore.FixedAllocator.free(FixedAllocator.java:652) ~[bigdata-1.0.5.jar:na] ... 44 common frames omitted
        Hide
        bryanthompson bryanthompson added a comment -

        We were able to replicate the problem against the 1.0.x maintenance branch based on some sample code. TestTicket473 has been created for this.

        Committed revision r5971.

        Show
        bryanthompson bryanthompson added a comment - We were able to replicate the problem against the 1.0.x maintenance branch based on some sample code. TestTicket473 has been created for this. Committed revision r5971.
        Hide
        bryanthompson bryanthompson added a comment -

        This has been tracked down to a commit on a BTree where the root node was not updated. The old root node was being recycled anyway, but since it has not been changed that wound up deleting the actual root node.

        Recycling of root nodes and checkpoints was disabled in 1.0.4 which is why this bug was not demonstrated against that version.

        The fix to not recycle the unmodified root node is easy enough.

                // delete old root data if changed
                if (checkpoint != null && getRoot() != null
                        && checkpoint.getRootAddr() != getRoot().identity) {
                    recycle(checkpoint != null ? checkpoint.getRootAddr() : IRawStore.NULL);
                }
        

        I want to look more closely at the needsCheckpoint() code and figure out why we were doing a checkpoint (i.e., what about the BTree was dirty). It is possible that it was a counter update or something like that.

        I will also review the B+Tree test suite and make sure that we have a unit test for this case.

        The HTree code (in 1.1.x) also needs to be reviewed for this.

        The error is corrected as of this commit. We will continue to work the issue to ensure that we have a unit test for this, look at the HTree, etc.

        Committed revision r5974 (1.0.x and 1.1.x).

        Show
        bryanthompson bryanthompson added a comment - This has been tracked down to a commit on a BTree where the root node was not updated. The old root node was being recycled anyway, but since it has not been changed that wound up deleting the actual root node. Recycling of root nodes and checkpoints was disabled in 1.0.4 which is why this bug was not demonstrated against that version. The fix to not recycle the unmodified root node is easy enough. // delete old root data if changed if (checkpoint != null && getRoot() != null && checkpoint.getRootAddr() != getRoot().identity) { recycle(checkpoint != null ? checkpoint.getRootAddr() : IRawStore.NULL); } I want to look more closely at the needsCheckpoint() code and figure out why we were doing a checkpoint (i.e., what about the BTree was dirty). It is possible that it was a counter update or something like that. I will also review the B+Tree test suite and make sure that we have a unit test for this case. The HTree code (in 1.1.x) also needs to be reviewed for this. The error is corrected as of this commit. We will continue to work the issue to ensure that we have a unit test for this, look at the HTree, etc. Committed revision r5974 (1.0.x and 1.1.x).
        Hide
        bryanthompson bryanthompson added a comment -

        Ok. The Term2IdWriteProc was updating the index local counter (BTree#getCounter()) if there was a blank node and the database was not in STORE_BLANK_NODES mode. This side effect was causing the BTree to be dirty and require a commit even though there were no writes on the index itself. This is because the counter is stored in the Checkpoint record.

        The fix above is therefore both necessary and sufficient.

        Show
        bryanthompson bryanthompson added a comment - Ok. The Term2IdWriteProc was updating the index local counter (BTree#getCounter()) if there was a blank node and the database was not in STORE_BLANK_NODES mode. This side effect was causing the BTree to be dirty and require a commit even though there were no writes on the index itself. This is because the counter is stored in the Checkpoint record. The fix above is therefore both necessary and sufficient.
        Hide
        bryanthompson bryanthompson added a comment -

        Closing this ticket. I have opened a new ticket [1] to provide the test suite coverage for writeCheckpoint() and recycling behavior for the BTree and HTree.

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/476 (Test suite for writeCheckpoint() and recycling for BTree/HTree)

        Show
        bryanthompson bryanthompson added a comment - Closing this ticket. I have opened a new ticket [1] to provide the test suite coverage for writeCheckpoint() and recycling behavior for the BTree and HTree. [1] https://sourceforge.net/apps/trac/bigdata/ticket/476 (Test suite for writeCheckpoint() and recycling for BTree/HTree)

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: