Details

    • Type: Sub-task
    • Status: Done
    • Priority: Highest
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_RELEASE_1_5_3
    • Component/s: None
    • Labels:
      None

      Description

      Martyn, please investigate the physical address exceptions in the log attached to BLZG-1418. My main question is whether this reflects a durable problem or a data race with a concurrent abort.

      For example:

      Aug 10 22:15:27 wdqs1002 bash[24866]: Caused by: java.lang.IllegalArgumentException: Unable to read data: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -258530248
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.rwstore.RWStore.getData(RWStore.java:2182)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.rwstore.RWStore.getData(RWStore.java:1990)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.rwstore.RWStore.getData(RWStore.java:2034)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.rwstore.RWStore.getData(RWStore.java:1942)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.journal.RWStrategy.readFromLocalStore(RWStrategy.java:733)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.journal.RWStrategy.read(RWStrategy.java:156)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.journal.AbstractJournal.read(AbstractJournal.java:4297)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.AbstractBTree.readNodeOrLeaf(AbstractBTree.java:4023)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.Node._getChild(Node.java:2727)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.AbstractBTree$1.compute(AbstractBTree.java:363)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.AbstractBTree$1.compute(AbstractBTree.java:346)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.util.concurrent.Memoizer$1.call(Memoizer.java:77)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.util.concurrent.Memoizer.compute(Memoizer.java:92)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.AbstractBTree.loadChild(AbstractBTree.java:532)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.Node.getChild(Node.java:2628)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.Node.indexOf(Node.java:971)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.Node.indexOf(Node.java:988)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.btree.AbstractBTree.rangeCount(AbstractBTree.java:2636)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.relation.accesspath.AccessPath.historicalRangeCount(AccessPath.java:1418)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.relation.accesspath.AccessPath.rangeCount(AccessPath.java:1386)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.bop.join.PipelineJoin$JoinTask$AccessPathTask.call(PipelineJoin.java:1619)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.bop.join.PipelineJoin$JoinTask$BindingSetConsumerTask.executeTasks(PipelineJoin.java:1353)
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.bop.join.PipelineJoin$JoinTask$BindingSetConsumerTask.call(PipelineJoin.java:977)
      Aug 10 22:15:27 wdqs1002 bash[24866]: ... 14 more
      Aug 10 22:15:27 wdqs1002 bash[24866]: Caused by: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -258530248
      Aug 10 22:15:27 wdqs1002 bash[24866]: at com.bigdata.rwstore.RWStore.getData(RWStore.java:2092)
      Aug 10 22:15:27 wdqs1002 bash[24866]: ... 37 more
      

      This is against 1.5.2.

        Issue Links

          Activity

          Hide
          martyncutcher martyncutcher added a comment -

          The long running tests in StressTestConcurrentRestApiRequests can show similar failures when enabled. These tests run multiple queries, interrupting some.

          I have not seen any evidence of durable corruption in these tests, the errors appear out of the memoizer and manifest as either an address exception or ChecksumError. Both scenarios would be triggered by "out-of-sync" data requests to either a freed or re-allocated address.

          The AllocationContext refactor is intended to protect against such problems.

          Show
          martyncutcher martyncutcher added a comment - The long running tests in StressTestConcurrentRestApiRequests can show similar failures when enabled. These tests run multiple queries, interrupting some. I have not seen any evidence of durable corruption in these tests, the errors appear out of the memoizer and manifest as either an address exception or ChecksumError. Both scenarios would be triggered by "out-of-sync" data requests to either a freed or re-allocated address. The AllocationContext refactor is intended to protect against such problems.
          Hide
          bryanthompson bryanthompson added a comment -

          Closing as a known issue that does not impact durable data. Work on this is in progress in BLZG-1236.

          Show
          bryanthompson bryanthompson added a comment - Closing as a known issue that does not impact durable data. Work on this is in progress in BLZG-1236 .

            People

            • Assignee:
              martyncutcher martyncutcher
              Reporter:
              bryanthompson bryanthompson
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: