Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-494

RWStore reports "FixedAllocator returning null address, with freeBits"

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Affects Version/s: TERMS_REFACTOR_BRANCH
    • Fix Version/s: None
    • Component/s: RWStore

      Description

      A stack trace was provided. This trace was reported after a server restart following the load of approximately 170m quads.

      82547 FixedAllocator returning null address, with freeBits: 12
      AllocBlock, baseAddress: 38205652992 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38210895872 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38216138752 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38221381632 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38226624512 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38231867392 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38239207424 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38257033216 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38260178944 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38263324672 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38266470400 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38269616128 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38272761856 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38275907584 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38279053312 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38282199040 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38285344768 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38288490496 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38291636224 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38294781952 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38300418048 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38303563776 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38306709504 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38309855232 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38313000960 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38316146688 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38319292416 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      AllocBlock, baseAddress: 38322438144 bits: -1 -1 -1 -1 -1 -1 -1 -1 
      
      82547 java.lang.IllegalStateException: No physical address found for 0
      java.lang.IllegalStateException: No physical address found for 0
      	at com.bigdata.rwstore.RWStore.alloc(RWStore.java:1875)
      [bigdata-terms/:na]
      	at com.bigdata.rwstore.RWStore.alloc(RWStore.java:1980)
      [bigdata-terms/:na]
      	at com.bigdata.journal.RWStrategy.write(RWStrategy.java:214)
      [bigdata-terms/:na]
      	at com.bigdata.journal.RWStrategy.write(RWStrategy.java:179)
      [bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.write(AbstractJournal.java:2664)
      [bigdata-terms/:na]
      	at
      com.bigdata.btree.AbstractBTree.writeNodeOrLeaf(AbstractBTree.java:3671)
      [bigdata-terms/:na]
      	at
      com.bigdata.btree.AbstractBTree.writeNodeRecursive(AbstractBTree.java:3483)
      [bigdata-terms/:na]
      	at com.bigdata.btree.BTree.flush(BTree.java:751) [bigdata-terms/:na]
      	at com.bigdata.btree.BTree.writeCheckpoint2(BTree.java:885)
      [bigdata-terms/:na]
      	at com.bigdata.btree.BTree.writeCheckpoint(BTree.java:856)
      [bigdata-terms/:na]
      	at com.bigdata.btree.BTree.handleCommit(BTree.java:1198)
      [bigdata-terms/:na]
      	at com.bigdata.journal.Name2Addr.handleCommit(Name2Addr.java:565)
      [bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.notifyCommitters(AbstractJournal.java:19
      83) [bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:2336)
      [bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:2250)
      [bigdata-terms/:na]
      	at
      com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:78)
      [bigdata-terms/:na]
      	at
      com.cambridgesemantics.anzo.datasource.bigdata.BigdataConnection.commitTrans
      action(BigdataConnection.java:167)
      [com.cambridgesemantics.anzo.datasource.bigdata/:na]
      	at
      com.cambridgesemantics.anzo.datasource.bigdata.BigdataQuadStore.commitTransa
      ction(BigdataQuadStore.java:246)
      [com.cambridgesemantics.anzo.datasource.bigdata/:na]
      	at
      org.openanzo.datasource.update.MultiStageUpdatesProcessor.handleTransactionE
      nd(MultiStageUpdatesProcessor.java:384) [org.openanzo.datasource/:na]
      	at
      org.openanzo.datasource.update.MultiStageUpdatesProcessor.handleTransaction(
      MultiStageUpdatesProcessor.java:264) [org.openanzo.datasource/:na]
      	at
      org.openanzo.datasource.update.MultiStageUpdatesProcessor.update(MultiStageU
      pdatesProcessor.java:138) [org.openanzo.datasource/:na]
      	at
      com.cambridgesemantics.anzo.datasource.generic.GenericUpdateService.updateIn
      ternal(GenericUpdateService.java:139)
      [com.cambridgesemantics.anzo.datasource/:na]
      	at
      org.openanzo.datasource.services.BaseUpdateService.update(BaseUpdateService.
      java:177) [org.openanzo.datasource/:na]
      	at org.openanzo.client.Replicator.update(Replicator.java:153)
      [org.openanzo.client/:na]
      	at
      org.openanzo.client.AnzoClient.updateRepository(AnzoClient.java:708)
      [org.openanzo.client/:na]
      	at
      org.openanzo.client.registry.RegistryDataset.commitRegistry(RegistryDataset.
      java:111) [org.openanzo.client.factory/:na]
      	at
      com.cambridgesemantics.anzo.osgi.OsgiManager$ServiceListener$1.run(OsgiManag
      er.java:309) [com.cambridgesemantics.anzo.osgi/:na]
      	at
      java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
      va:886) [na:1.6.0_26]
      	at
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
      08) [na:1.6.0_26]
      	at java.lang.Thread.run(Thread.java:662) [na:1.6.0_26]
      82547 Error commiting Bigdata
      java.lang.RuntimeException: Could not commit index:
      name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb3VyY2U=.spo.PO
      CS,
      commitList=[DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c
      3RlbURhdGFzb3VyY2U=.lex.ID2TERM,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.lex.TERM2ID,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.CSPO,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.OCSP,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.PCSO,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.POCS,needsCheckpoint},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.SCOP,needsCheckpoint},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.SOPC,needsCheckpoint},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.SPOC,needsCheckpoint},
      DirtyListener{name=nguuid,needsCheckpoint}]: lastRootBlock=rootBlock{ rootBlock=0, challisField=4964, version=2, nextOffset=2630375451186887,
      localTime=1311618914866 [Monday, July 25, 2011 2:35:14 PM EDT],
      firstCommitTime=1311435890342 [Saturday, July 23, 2011 11:44:50 AM EDT],
      lastCommitTime=1311618914709 [Monday, July 25, 2011 2:35:14 PM EDT], commitCounter=4964, commitRecordAddr={off=NATIVE:-43177912,len=422},
      commitRecordIndexAddr={off=NATIVE:-43595613,len=220}, quorumToken=0, metaBitsAddr=2630259983778044, metaStartAddr=653732, storeType=RW, uuid=0e7a75a9-5214-4dcc-bf73-3da053c8400d, offsetBits=42, checksum=-38265159, createTime=1311435888390 [Saturday, July 23, 2011
      11:44:48 AM EDT], closeTime=0}
      	at
      com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:2252)
      ~[bigdata-terms/:na]
      	at
      com.bigdata.rdf.store.LocalTripleStore.commit(LocalTripleStore.java:78)
      ...
      Caused by: java.lang.RuntimeException: Could not commit index:
      name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb3VyY2U=.spo.PO
      CS,
      commitList=[DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c
      3RlbURhdGFzb3VyY2U=.lex.ID2TERM,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.lex.TERM2ID,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.CSPO,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.OCSP,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.PCSO,checkpointAddr=0},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.POCS,needsCheckpoint},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.SCOP,needsCheckpoint},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.SOPC,needsCheckpoint},
      DirtyListener{name=aHR0cDovL29wZW5hbnpvLm9yZy9kYXRhc291cmNlL3N5c3RlbURhdGFzb
      3VyY2U=.spo.SPOC,needsCheckpoint},
      DirtyListener{name=nguuid,needsCheckpoint}]
      	at com.bigdata.journal.Name2Addr.handleCommit(Name2Addr.java:573)
      ~[bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.notifyCommitters(AbstractJournal.java:19
      83) ~[bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:2336)
      ~[bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:2250)
      ~[bigdata-terms/:na]
      	... 15 common frames omitted
      Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: No physical address found for 0
      	at com.bigdata.rwstore.RWStore.alloc(RWStore.java:1886)
      ~[bigdata-terms/:na]
      	at com.bigdata.rwstore.RWStore.alloc(RWStore.java:1980)
      ~[bigdata-terms/:na]
      	at com.bigdata.journal.RWStrategy.write(RWStrategy.java:214)
      ~[bigdata-terms/:na]
      	at com.bigdata.journal.RWStrategy.write(RWStrategy.java:179)
      ~[bigdata-terms/:na]
      	at
      com.bigdata.journal.AbstractJournal.write(AbstractJournal.java:2664)
      ~[bigdata-terms/:na]
      	at
      com.bigdata.btree.AbstractBTree.writeNodeOrLeaf(AbstractBTree.java:3671)
      ~[bigdata-terms/:na]
      	at
      com.bigdata.btree.AbstractBTree.writeNodeRecursive(AbstractBTree.java:3483)
      ~[bigdata-terms/:na]
      	at com.bigdata.btree.BTree.flush(BTree.java:751)
      ~[bigdata-terms/:na]
      	at com.bigdata.btree.BTree.writeCheckpoint2(BTree.java:885)
      ~[bigdata-terms/:na]
      	at com.bigdata.btree.BTree.writeCheckpoint(BTree.java:856)
      ~[bigdata-terms/:na]
      	at com.bigdata.btree.BTree.handleCommit(BTree.java:1198)
      ~[bigdata-terms/:na]
      	at com.bigdata.journal.Name2Addr.handleCommit(Name2Addr.java:565)
      ~[bigdata-terms/:na]
      	... 18 common frames omitted
      Caused by: java.lang.IllegalStateException: No physical address found for 0
      	at com.bigdata.rwstore.RWStore.alloc(RWStore.java:1875)
      ~[bigdata-terms/:na]
      	... 29 common frames omitted
      
      

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        See also [1], which documents an occasional CI failure which is potentially a related issue.

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/343 (Stochastic assert in AbstractBTree#writeNodeOrLeaf() in CI)

        Show
        bryanthompson bryanthompson added a comment - See also [1] , which documents an occasional CI failure which is potentially a related issue. [1] https://sourceforge.net/apps/trac/bigdata/ticket/343 (Stochastic assert in AbstractBTree#writeNodeOrLeaf() in CI)
        Hide
        bryanthompson bryanthompson added a comment -

        Martyn wrote: Any idea how many quads you are trying to add in this commit and if it is the first commit after restart? The freeBits count is set on load and then maintained on alloc/free. 12 is an odd number (in this context)

        MRoy wrote: I'm not sure in that particular case how many, but if I run it again, I reran it again, and this time it happened adding 4 statements, and removing 4 statements within a transaction. It is not the first commit on restart, in the current run, it was the 12th commit.

        Show
        bryanthompson bryanthompson added a comment - Martyn wrote: Any idea how many quads you are trying to add in this commit and if it is the first commit after restart? The freeBits count is set on load and then maintained on alloc/free. 12 is an odd number (in this context) MRoy wrote: I'm not sure in that particular case how many, but if I run it again, I reran it again, and this time it happened adding 4 statements, and removing 4 statements within a transaction. It is not the first commit on restart, in the current run, it was the 12th commit.
        Hide
        bryanthompson bryanthompson added a comment -

        We have a possible culprit in sign extension when counting the #of free bits when reading an allocator in from the disk. Martyn is going to put together some more unit tests for the methods which count, test, and set bits looking for sign extension problems. There are also some methods on java.lang.Integer, such as bitCount() and "highest" or "lowest" bit which might help us to avoid the pitfalls of bit math on signed integers.

        Show
        bryanthompson bryanthompson added a comment - We have a possible culprit in sign extension when counting the #of free bits when reading an allocator in from the disk. Martyn is going to put together some more unit tests for the methods which count, test, and set bits looking for sign extension problems. There are also some methods on java.lang.Integer, such as bitCount() and "highest" or "lowest" bit which might help us to avoid the pitfalls of bit math on signed integers.
        Hide
        bryanthompson bryanthompson added a comment -

        Martyn, when you get a fix, please also apply it to the 1.0.0 maintenance branch and I will incorporate it into the 1.0.1 bug fix release. Bryan

        Show
        bryanthompson bryanthompson added a comment - Martyn, when you get a fix, please also apply it to the 1.0.0 maintenance branch and I will incorporate it into the 1.0.1 bug fix release. Bryan
        Hide
        bryanthompson bryanthompson added a comment -

        Martyn wrote: I think I have fixed the RWStore allocation problem. I just need to understand why I need to relax an assertion that now fails.

        Show
        bryanthompson bryanthompson added a comment - Martyn wrote: I think I have fixed the RWStore allocation problem. I just need to understand why I need to relax an assertion that now fails.
        Hide
        martyncutcher martyncutcher added a comment -

        The problem was not related to bit math with signed integers, although for efficiency reasons this has been updated to take advantage of the Integer.bitCount method.

        Instead it was due to the maintenance of a count on the transient bits. A FixedAllocator maintains three bitmaps, the liveBits, committedBits and the transientBits. The liveBits represent the currently allocated bits and the committedBits the bits allocated at the last commit (a snapshot of the liveBits on commit). The transientBits is the OR of both liveBits and commitBits and is the bitmap tested when making a new allocation (to protect currently committed data).

        The freeBits count represents the number of allocation slots available in the transientBits and the transientFreeBits count the number of protected commitBits
        - those allocations that have been freed but cannot be re-allocated until after the next commit.

        The error was that in some situations on commit the transientBits were not reset to the liveBits BUT the freeBits count was incremented by the transientFreeBits count, resulting in the freeBits count getting out-of-sync with the transientBits and a FixedAllocator declaring itself available to allocate memory when no bits were free.

        The fix has resulted in the occasional tripping of two previous assertions- that after abort or release of an AllocationContext the transientFreeBits count should be zero, but it has now been determined that these were erroneously dependent on the previous incorrect code so have been removed.

        Show
        martyncutcher martyncutcher added a comment - The problem was not related to bit math with signed integers, although for efficiency reasons this has been updated to take advantage of the Integer.bitCount method. Instead it was due to the maintenance of a count on the transient bits. A FixedAllocator maintains three bitmaps, the liveBits, committedBits and the transientBits. The liveBits represent the currently allocated bits and the committedBits the bits allocated at the last commit (a snapshot of the liveBits on commit). The transientBits is the OR of both liveBits and commitBits and is the bitmap tested when making a new allocation (to protect currently committed data). The freeBits count represents the number of allocation slots available in the transientBits and the transientFreeBits count the number of protected commitBits - those allocations that have been freed but cannot be re-allocated until after the next commit. The error was that in some situations on commit the transientBits were not reset to the liveBits BUT the freeBits count was incremented by the transientFreeBits count, resulting in the freeBits count getting out-of-sync with the transientBits and a FixedAllocator declaring itself available to allocate memory when no bits were free. The fix has resulted in the occasional tripping of two previous assertions- that after abort or release of an AllocationContext the transientFreeBits count should be zero, but it has now been determined that these were erroneously dependent on the previous incorrect code so have been removed.
        Hide
        bryanthompson bryanthompson added a comment -

        Martyn wrote: The persistent data should be fine, it is simply that the Allocator declared itself available when it shouldn't have so the commit that required the allocation would fail.

        Show
        bryanthompson bryanthompson added a comment - Martyn wrote: The persistent data should be fine, it is simply that the Allocator declared itself available when it shouldn't have so the commit that required the allocation would fail.
        Hide
        bryanthompson bryanthompson added a comment -

        Modified FixedAllocator to invoke checkBits() using conditional logging @ DEBUG rather than assert since we frequently run with asserts enabled and checkBits() might add some overhead. Also removed 2 computed but unused values and made a 2 method arguments final.

        Committed revision r5011.

        Show
        bryanthompson bryanthompson added a comment - Modified FixedAllocator to invoke checkBits() using conditional logging @ DEBUG rather than assert since we frequently run with asserts enabled and checkBits() might add some overhead. Also removed 2 computed but unused values and made a 2 method arguments final. Committed revision r5011.
        Hide
        bryanthompson bryanthompson added a comment -

        Note: The changes mentioned above were committed to the 1.0.0 maintenance branch and the development branch.

        Show
        bryanthompson bryanthompson added a comment - Note: The changes mentioned above were committed to the 1.0.0 maintenance branch and the development branch.

          People

          • Assignee:
            martyncutcher martyncutcher
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: