Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-746

Rare AssertionError in WriteCache.clearAddrMap()

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Affects Version/s: BIGDATA_RELEASE_1_2_2
    • Fix Version/s: None
    • Component/s: Journal

      Description

      A rare assertion error can appear in WriteCache.clearAddrMap(). This error has been observed during large incremental truth maintenance operations when the entire contents of the temporary triple store are flushed through to the database. The apparent root cause is the concurrent WriteCache.reset() of a WriteCache with the WriteCacheService.clearWrite() call. This race condition was not protected in the code because the WriteCache's internal read lock is not held before WriteCache.clearAddrMap() is invoked.

      Caused by: java.lang.AssertionError
      at com.bigdata.io.writecache.WriteCache.clearAddrMap(WriteCache.java:2001)
      at com.bigdata.io.writecache.WriteCacheService.clearWrite(WriteCacheService.java:2738)
      at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2034)
      at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:1996)
      at com.bigdata.rwstore.RWStore.free(RWStore.java:1842)
      at com.bigdata.rwstore.RWStore.free(RWStore.java:1752)
      at com.bigdata.rwstore.RWStore.freeBlob(RWStore.java:1945)
      at com.bigdata.rwstore.RWStore.free(RWStore.java:1787)
      at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:285)
      at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:263)
      at com.bigdata.journal.AbstractJournal.delete(AbstractJournal.java:3315)
      

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Fix committed. Running now in CI.

        Committed revision r6995 (both dev and READ_CACHE branches)

        Show
        bryanthompson bryanthompson added a comment - Fix committed. Running now in CI. Committed revision r6995 (both dev and READ_CACHE branches)
        Hide
        bryanthompson bryanthompson added a comment -

        Look clear in CI. Waiting on clean load of the customer data set.

        Show
        bryanthompson bryanthompson added a comment - Look clear in CI. Waiting on clean load of the customer data set.
        Hide
        bryanthompson bryanthompson added a comment -

        Hit the same issue at a different assertion.

        Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338
        	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
        	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
        	at com.bigdata.rdf.store.AbstractTripleStore.copyStatements(AbstractTripleStore.java:3664)
        	... 6 more
        Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338
        	at com.bigdata.rdf.spo.SPORelation.insert(SPORelation.java:2204)
        	at com.bigdata.rdf.store.AbstractTripleStore.addStatements(AbstractTripleStore.java:3906)
        	at com.bigdata.rdf.spo.StatementWriter.call(StatementWriter.java:115)
        	at com.bigdata.rdf.spo.StatementWriter.call(StatementWriter.java:26)
        	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        	at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        	at java.lang.Thread.run(Thread.java:722)
        Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338
        	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
        	at java.util.concurrent.FutureTask.get(FutureTask.java:111)
        	at com.bigdata.rdf.spo.SPORelation.logFuture(SPORelation.java:2230)
        	at com.bigdata.rdf.spo.SPORelation.insert(SPORelation.java:2185)
        	... 8 more
        Caused by: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338
        	at com.bigdata.io.writecache.WriteCacheService.clearWrite(WriteCacheService.java:2743)
        	at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2034)
        	at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:1996)
        	at com.bigdata.rwstore.RWStore.free(RWStore.java:1842)
        	at com.bigdata.rwstore.RWStore.free(RWStore.java:1752)
        	at com.bigdata.rwstore.RWStore.freeBlob(RWStore.java:1945)
        	at com.bigdata.rwstore.RWStore.free(RWStore.java:1787)
        	at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:285)
        	at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:263)
        	at com.bigdata.journal.AbstractJournal.delete(AbstractJournal.java:3315)
        	at com.bigdata.btree.AbstractBTree.recycle(AbstractBTree.java:4264)
        	at com.bigdata.btree.AbstractBTree.deleteNodeOrLeaf(AbstractBTree.java:4241)
        	at com.bigdata.btree.Node.replaceChildRef(Node.java:880)
        	at com.bigdata.btree.AbstractNode.copyOnWrite(AbstractNode.java:548)
        	at com.bigdata.btree.AbstractNode.copyOnWrite(AbstractNode.java:419)
        	at com.bigdata.btree.Leaf.insert(Leaf.java:561)
        	at com.bigdata.btree.Node.insert(Node.java:923)
        	at com.bigdata.btree.Node.insert(Node.java:923)
        	at com.bigdata.btree.AbstractBTree.insert(AbstractBTree.java:2043)
        	at com.bigdata.btree.AbstractBTree.insert(AbstractBTree.java:1987)
        	at com.bigdata.rdf.spo.SPOIndexWriteProc.apply(SPOIndexWriteProc.java:244)
        	at com.bigdata.btree.UnisolatedReadWriteIndex.submit(UnisolatedReadWriteIndex.java:825)
        	at com.bigdata.rdf.spo.SPOIndexWriter.call(SPOIndexWriter.java:363)
        	at com.bigdata.rdf.spo.SPOIndexWriter.call(SPOIndexWriter.java:68)
        	... 5 more
        

        Per the documentation in the WriteCacheService for this assertion, WriteTask.resetWith() does not take the read lock and hence a concurrent clear of the cache reference is possible. Therefore, I am taking out the acquire()/release() around this code block that was put in above and modifying it as follows around the recordMap.remove() call:

                            // Remove entry from the recordMap.
                            final WriteCache oldValue = recordMap.remove(offset);
                            if (oldValue == null) {
                                /**
                                 * Note: The [WriteCache.transferLock] protects the
                                 * WriteCache against a concurrent transfer of a record
                                 * in WriteCache.transferTo(). However,
                                 * WriteCache.resetWith() does NOT take the
                                 * transferLock. Therefore, it is possible (and valid)
                                 * for the [recordMap] entry to be cleared to [null] for
                                 * this record by a concurrent resetWith() call.
                                 * 
                                 * @see <a href=
                                 *      "https://sourceforge.net/apps/trac/bigdata/ticket/654"
                                 *      Rare AssertionError in WriteCache.clearAddrMap()
                                 *      </a>
                                 */
                                continue;
                            }
        

        Committed revision 7001. (dev branch)
        Committed revision 7002. (READ_CACHE branch)

        Re-running the customer data load.

        Show
        bryanthompson bryanthompson added a comment - Hit the same issue at a different assertion. Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at com.bigdata.rdf.store.AbstractTripleStore.copyStatements(AbstractTripleStore.java:3664) ... 6 more Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338 at com.bigdata.rdf.spo.SPORelation.insert(SPORelation.java:2204) at com.bigdata.rdf.store.AbstractTripleStore.addStatements(AbstractTripleStore.java:3906) at com.bigdata.rdf.spo.StatementWriter.call(StatementWriter.java:115) at com.bigdata.rdf.spo.StatementWriter.call(StatementWriter.java:26) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at com.bigdata.rdf.spo.SPORelation.logFuture(SPORelation.java:2230) at com.bigdata.rdf.spo.SPORelation.insert(SPORelation.java:2185) ... 8 more Caused by: java.lang.AssertionError: oldValue=null, cache=com.bigdata.rwstore.RWStore$WriteCacheImpl@775fec6f{recordCount=250,firstOffset=12005892096,releaseBuffer=true,prefixWrites=true,useChecksum=true,bytesWritten=1044331,bytesRemaining=4245,bytesRemoved=0}, offset=11978642432, latchedAddr=-11377338 at com.bigdata.io.writecache.WriteCacheService.clearWrite(WriteCacheService.java:2743) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:2034) at com.bigdata.rwstore.RWStore.immediateFree(RWStore.java:1996) at com.bigdata.rwstore.RWStore.free(RWStore.java:1842) at com.bigdata.rwstore.RWStore.free(RWStore.java:1752) at com.bigdata.rwstore.RWStore.freeBlob(RWStore.java:1945) at com.bigdata.rwstore.RWStore.free(RWStore.java:1787) at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:285) at com.bigdata.journal.RWStrategy.delete(RWStrategy.java:263) at com.bigdata.journal.AbstractJournal.delete(AbstractJournal.java:3315) at com.bigdata.btree.AbstractBTree.recycle(AbstractBTree.java:4264) at com.bigdata.btree.AbstractBTree.deleteNodeOrLeaf(AbstractBTree.java:4241) at com.bigdata.btree.Node.replaceChildRef(Node.java:880) at com.bigdata.btree.AbstractNode.copyOnWrite(AbstractNode.java:548) at com.bigdata.btree.AbstractNode.copyOnWrite(AbstractNode.java:419) at com.bigdata.btree.Leaf.insert(Leaf.java:561) at com.bigdata.btree.Node.insert(Node.java:923) at com.bigdata.btree.Node.insert(Node.java:923) at com.bigdata.btree.AbstractBTree.insert(AbstractBTree.java:2043) at com.bigdata.btree.AbstractBTree.insert(AbstractBTree.java:1987) at com.bigdata.rdf.spo.SPOIndexWriteProc.apply(SPOIndexWriteProc.java:244) at com.bigdata.btree.UnisolatedReadWriteIndex.submit(UnisolatedReadWriteIndex.java:825) at com.bigdata.rdf.spo.SPOIndexWriter.call(SPOIndexWriter.java:363) at com.bigdata.rdf.spo.SPOIndexWriter.call(SPOIndexWriter.java:68) ... 5 more Per the documentation in the WriteCacheService for this assertion, WriteTask.resetWith() does not take the read lock and hence a concurrent clear of the cache reference is possible. Therefore, I am taking out the acquire()/release() around this code block that was put in above and modifying it as follows around the recordMap.remove() call: // Remove entry from the recordMap. final WriteCache oldValue = recordMap.remove(offset); if (oldValue == null) { /** * Note: The [WriteCache.transferLock] protects the * WriteCache against a concurrent transfer of a record * in WriteCache.transferTo(). However, * WriteCache.resetWith() does NOT take the * transferLock. Therefore, it is possible (and valid) * for the [recordMap] entry to be cleared to [null] for * this record by a concurrent resetWith() call. * * @see <a href= * "https://sourceforge.net/apps/trac/bigdata/ticket/654" * Rare AssertionError in WriteCache.clearAddrMap() * </a> */ continue; } Committed revision 7001. (dev branch) Committed revision 7002. (READ_CACHE branch) Re-running the customer data load.
        Hide
        bryanthompson bryanthompson added a comment -

        Incremental truth maintenance load of large data set has now completed once without errors. The temporary store was 33G before it was flushed to the database. The final database was 15G. I am re-running the load now for further validation that this stochastic problem has been resolved.

        Show
        bryanthompson bryanthompson added a comment - Incremental truth maintenance load of large data set has now completed once without errors. The temporary store was 33G before it was flushed to the database. The final database was 15G. I am re-running the load now for further validation that this stochastic problem has been resolved.
        Hide
        bryanthompson bryanthompson added a comment -

        I have done a 2nd run without hitting any problems and have verified the file using DumpJournal -pages. I am closing this ticket.

        Show
        bryanthompson bryanthompson added a comment - I have done a 2nd run without hitting any problems and have verified the file using DumpJournal -pages. I am closing this ticket.

          People

          • Assignee:
            martyncutcher martyncutcher
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: