Details

      Description

      See this forum thread: https://sourceforge.net/projects/bigdata/forums/forum/676946/topic/4451221
      There is a problem when RW-connections do a rollback: it leaves the database in a state that leads to exceptions on queries.

      Example stacktrace:

      WARN : 4105      com.bigdata.journal.Journal.executorService1 com.bigdata.rdf.lexicon.LexiconRelation$ResolveTermTask.call(LexiconRelation.java:1875): No such term: TermId(24L)
      WARN : 4105      com.bigdata.journal.Journal.executorService1 com.bigdata.rdf.lexicon.LexiconRelation$ResolveTermTask.call(LexiconRelation.java:1875): No such term: TermId(20U)
      WARN : 4106      com.bigdata.journal.Journal.executorService3 com.bigdata.rdf.lexicon.LexiconRelation$ResolveTermTask.call(LexiconRelation.java:1875): No such term: TermId(24L)
      WARN : 4106      com.bigdata.journal.Journal.executorService3 com.bigdata.rdf.lexicon.LexiconRelation$ResolveTermTask.call(LexiconRelation.java:1875): No such term: TermId(20U)
      WARN : 4113      com.bigdata.journal.Journal.executorService3 com.bigdata.rdf.lexicon.LexiconRelation$ResolveTermTask.call(LexiconRelation.java:1875): No such term: TermId(24L)
      WARN : 4114      com.bigdata.journal.Journal.executorService3 com.bigdata.rdf.lexicon.LexiconRelation$ResolveTermTask.call(LexiconRelation.java:1875): No such term: TermId(20U)
      ERROR: 4117      Thread-3 com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator.checkFuture(BlockingBuffer.java:1466): java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException
      java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException
      	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
      	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
      	at com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator.checkFuture(BlockingBuffer.java:1425)
      	at com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator._hasNext(BlockingBuffer.java:1649)
      	at com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator.hasNext(BlockingBuffer.java:1512)
      	at com.bigdata.striterator.AbstractChunkedResolverator.hasNext(AbstractChunkedResolverator.java:285)
      	at com.bigdata.rdf.sail.Bigdata2SesameIteration.hasNext(Bigdata2SesameIteration.java:68)
      	at com.bigdata.rdf.sail.QueryEvaluationIterator.hasNext(QueryEvaluationIterator.java:33)
      	at info.aduna.iteration.IterationWrapper.hasNext(IterationWrapper.java:57)
      	at info.aduna.iteration.FilterIteration.findNextElement(FilterIteration.java:68)
      	at info.aduna.iteration.FilterIteration.hasNext(FilterIteration.java:43)
      	at info.aduna.iteration.ConvertingIteration.hasNext(ConvertingIteration.java:62)
      	at info.aduna.iteration.ConvertingIteration.hasNext(ConvertingIteration.java:62)
      	at info.aduna.iteration.IterationWrapper.hasNext(IterationWrapper.java:57)
      	at info.aduna.iteration.LimitIteration.hasNext(LimitIteration.java:62)
      	at org.openrdf.query.impl.TupleQueryResultImpl.hasNext(TupleQueryResultImpl.java:90)
      	at com.bigdata.rdf.sail.TestRollbacks$DoStuff.query(TestRollbacks.java:222)
      	at com.bigdata.rdf.sail.TestRollbacks$DoStuff.reader(TestRollbacks.java:178)
      	at com.bigdata.rdf.sail.TestRollbacks$DoStuff.run(TestRollbacks.java:154)
      	at java.lang.Thread.run(Thread.java:619)
      Caused by: java.lang.IllegalArgumentException
      	at com.bigdata.rdf.model.BigdataStatementImpl.<init>(BigdataStatementImpl.java:81)
      	at com.bigdata.rdf.model.BigdataValueFactoryImpl.createStatement(BigdataValueFactoryImpl.java:339)
      	at com.bigdata.rdf.model.BigdataValueFactoryImpl.createStatement(BigdataValueFactoryImpl.java:1)
      	at com.bigdata.rdf.store.BigdataStatementIteratorImpl.resolveChunk(BigdataStatementIteratorImpl.java:218)
      	at com.bigdata.rdf.store.BigdataStatementIteratorImpl.resolveChunk(BigdataStatementIteratorImpl.java:1)
      	at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:218)
      	at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:1)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
      	... 1 more
      

      See attached testcase that reproduces the issue by starting three threads, one that's just writing data to a fresh database, and two others that query that data, while doing periodic rollbacks.

      The code that is commented out for properly closing connections results in another set of exceptions, and fails to shutdown the database.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        While this test appears to run correctly in the trunk and the quads branch when using the unisolated connection and while it runs correctly in the quads branch when using quads mode and full transactions, the test still fails when run in the trunk with full transactions in quads mode and it fails in triples or sids mode in the quad branch when using full transactions.

        We currently lack an explanation for the test passing in the quads branch in quads mode with full transactions while failing in the sids and triples modes in the quads branch with full transactions. If we could observe a failure of the test in the quads branch in quads mode with full transactions, that would narrow things down to the full transaction support and/or a concurrency issue not observable when using the unisolated connection because all writers are now serialized.

        Perhaps the tx read through to the backing unisolated index and the tx commit protocols be violating the single threaded for mutation constraint?

        The other interpretation would be a referential integrity issue where the term is present in the committed state of the forward lexicon but not found in the reverse lexicon by a transaction reading from an intermediate commit point. This might (conjecture) arise because a transaction commit forces TERM2ID to disk before ID2TERM is dirty and another transaction reads from the commit point in which TERM2ID and ID2TERM are not fully consistent.

        Show
        bryanthompson bryanthompson added a comment - While this test appears to run correctly in the trunk and the quads branch when using the unisolated connection and while it runs correctly in the quads branch when using quads mode and full transactions, the test still fails when run in the trunk with full transactions in quads mode and it fails in triples or sids mode in the quad branch when using full transactions. We currently lack an explanation for the test passing in the quads branch in quads mode with full transactions while failing in the sids and triples modes in the quads branch with full transactions. If we could observe a failure of the test in the quads branch in quads mode with full transactions, that would narrow things down to the full transaction support and/or a concurrency issue not observable when using the unisolated connection because all writers are now serialized. Perhaps the tx read through to the backing unisolated index and the tx commit protocols be violating the single threaded for mutation constraint? The other interpretation would be a referential integrity issue where the term is present in the committed state of the forward lexicon but not found in the reverse lexicon by a transaction reading from an intermediate commit point. This might (conjecture) arise because a transaction commit forces TERM2ID to disk before ID2TERM is dirty and another transaction reads from the commit point in which TERM2ID and ID2TERM are not fully consistent.
        Hide
        bryanthompson bryanthompson added a comment -

        Of interest, the failure always appears in the code path where the query is being evaluated and then only if the query uses a statement index scan rather than a join. These are two quite different code paths.

        Also, the trunk is apparently missing logic in LexiconRelation#getId2TermIndex() and getTerm2IdIndex() to force the use of the unisolated indices when in a full read/write transaction.

        Show
        bryanthompson bryanthompson added a comment - Of interest, the failure always appears in the code path where the query is being evaluated and then only if the query uses a statement index scan rather than a join. These are two quite different code paths. Also, the trunk is apparently missing logic in LexiconRelation#getId2TermIndex() and getTerm2IdIndex() to force the use of the unisolated indices when in a full read/write transaction.
        Hide
        bryanthompson bryanthompson added a comment -

        Ok. I found the root cause for TestRollbacks. Unlike [1,2], the root cause here was invoking the database level (Journal) abort() rather than just the tx abort() from the BigdataSailConnection subclass providing support for transaction isolation. This was causing the write sets on the unisolated lexicon indices to be discarded when only the isolated statement indices should have had their write sets discarded.

        While this provides a clear reason why TestRollbacks was failing when using full transactions in the triples only and sids modes, it does not explain why it was not failing in the quads mode. That remains a mystery.

        I have also uncovered some possible issues within the LexiconRelation's addTerms() / getTerms() methods that I want to talk through with MikeP in a code review.

        I will modify TestRollbacks to run as a proxy test for each of the triple store modes (triples, sids, quads) and both with and without full tx isolation and incorporate those variants into the test suite. I will also propagate the necessary test suite changes and the change to the BigdataSailConnection to the trunk and verify correct operation there as well.

        [1] http://sourceforge.net/apps/trac/bigdata/ticket/288
        [2] http://sourceforge.net/apps/trac/bigdata/ticket/284

        Show
        bryanthompson bryanthompson added a comment - Ok. I found the root cause for TestRollbacks. Unlike [1,2] , the root cause here was invoking the database level (Journal) abort() rather than just the tx abort() from the BigdataSailConnection subclass providing support for transaction isolation. This was causing the write sets on the unisolated lexicon indices to be discarded when only the isolated statement indices should have had their write sets discarded. While this provides a clear reason why TestRollbacks was failing when using full transactions in the triples only and sids modes, it does not explain why it was not failing in the quads mode. That remains a mystery. I have also uncovered some possible issues within the LexiconRelation's addTerms() / getTerms() methods that I want to talk through with MikeP in a code review. I will modify TestRollbacks to run as a proxy test for each of the triple store modes (triples, sids, quads) and both with and without full tx isolation and incorporate those variants into the test suite. I will also propagate the necessary test suite changes and the change to the BigdataSailConnection to the trunk and verify correct operation there as well. [1] http://sourceforge.net/apps/trac/bigdata/ticket/288 [2] http://sourceforge.net/apps/trac/bigdata/ticket/284
        Hide
        bryanthompson bryanthompson added a comment -

        I've added an issue for a code review on the appropriate methods of the LexiconRelation [1].

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/292

        Show
        bryanthompson bryanthompson added a comment - I've added an issue for a code review on the appropriate methods of the LexiconRelation [1] . [1] https://sourceforge.net/apps/trac/bigdata/ticket/292
        Hide
        bryanthompson bryanthompson added a comment -

        This issue should be resolved in the trunk (r4467) and the quads branch (r4466). In addition to the problems documented by [1,2], there was a problem with full tx aborts causing journal level aborts as documented above.

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/284
        [2] https://sourceforge.net/apps/trac/bigdata/ticket/288

        Show
        bryanthompson bryanthompson added a comment - This issue should be resolved in the trunk (r4467) and the quads branch (r4466). In addition to the problems documented by [1,2] , there was a problem with full tx aborts causing journal level aborts as documented above. [1] https://sourceforge.net/apps/trac/bigdata/ticket/284 [2] https://sourceforge.net/apps/trac/bigdata/ticket/288

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            gjdev gjdev
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: