Details

    • Type: Bug
    • Status: Closed - Won't Fix
    • Priority: Medium
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Wikidata Query Service
    • Labels:
      None

      Description

      When accessing http://localhost:9999/bigdata/status?dumpPages&dumpJournal I see only a part of the report, which stops at:

      name=__globalRowStore
      	Checkpoint{indexType=BTree,height=1,nnodes=1,nleaves=4,nentries=86,counter=0,addrRoot=-175934745345881,addrMetadata=-70385924045614,addrBloomFilter=0,addrCheckpoint=-35205846925092}
      	addrMetadata=0, name=__globalRowStore, indexType=BTree, indexUUID=15d5639e-de61-4a35-80c3-ba3fbc95bb3d, branchingFactor=32, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@69bb1636{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.btree.DefaultTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=null, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@79af6974{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.CanonicalHuffmanRabaCoder@6b347d15}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=com.bigdata.sparse.LogicalRowSplitHandler@430024af, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
      	com.bigdata.btree.BTreePageStats{indexType=BTree,m=32,nnodes=1,nleaves=4,nrawRecs=0,nodeBytes=167,minNodeBytes=167,maxNodeBytes=167,leafBytes=5495,minLeafBytes=1064,maxLeafBytes=1959,rawRecBytes=0,bytesPerNode=167,bytesPerLeaf=1373,bytesPerRawRec=0,nerrors=0,slot_64=0.0,slot_128=0.0,slot_192=0.2,slot_320=0.0,slot_512=0.0,slot_768=0.0,slot_1024=0.0,slot_2048=0.8,slot_3072=0.0,slot_4096=0.0,slot_8192=0.0,blobs=0.0,newM=703}
      name=wdq.lex.BLOBS
      	Checkpoint{indexType=BTree,height=1,nnodes=1,nleaves=9,nentries=2901,counter=0,addrRoot=-515749903515778919,addrMetadata=-21474835649,addrBloomFilter=0,addrCheckpoint=-628365303226564388}
      	addrMetadata=0, name=wdq.lex.BLOBS, indexType=BTree, indexUUID=03fbb544-c576-4351-a2b9-b459ce1853aa, branchingFactor=400, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@49677003{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.BlobsTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=8}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@25eafa9b{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@65eae98f}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=true, maxRecLen=0, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
      	com.bigdata.btree.BTreePageStats{indexType=BTree,m=400,nnodes=1,nleaves=9,nrawRecs=2901,nodeBytes=153,minNodeBytes=153,maxNodeBytes=153,leafBytes=57333,minLeafBytes=3955,maxLeafBytes=7380,rawRecBytes=933375,bytesPerNode=153,bytesPerLeaf=6370,bytesPerRawRec=321,nerrors=0,slot_64=0.0,slot_128=0.0,slot_192=0.1,slot_320=0.0,slot_512=0.0,slot_768=0.0,slot_1024=0.0,slot_2048=0.0,slot_3072=0.0,slot_4096=0.2,slot_8192=0.7,blobs=0.0,newM=8772}
      name=wdq.lex.ID2TERM
      	Checkpoint{indexType=BTree,height=2,nnodes=611,nleaves=245182,nentries=98073072,counter=0,addrRoot=-631130059574335991,addrMetadata=-17179868349,addrBloomFilter=0,addrCheckpoint=-631892506463698724}
      	addrMetadata=0, name=wdq.lex.ID2TERM, indexType=BTree, indexUUID=5e0276af-7fd7-4ac6-9030-3fb5df75fd46, branchingFactor=800, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@360b5072{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.Id2TermTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=9}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@20ce8816{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@57486d2d}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=true, maxRecLen=16, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
      

      In the log, I see this:

      Aug 19 23:55:12 db01 bash[1195]: WARN : RWStore.java:6371: WriteCacheDebug: 53718642432 - No WriteCache debug info
      Aug 19 23:55:12 db01 bash[1195]: WARN : FixedAllocator.java:188: Physical address 53718642432 not accessible for Allocator of size 64
      Aug 19 23:55:12 db01 bash[1195]: ERROR: AbstractBTree.java:1607: Error reading child[i=577]: java.lang.IllegalArgumentException: Unable to read data: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -146023871
      Aug 19 23:55:12 db01 bash[1195]: java.lang.IllegalArgumentException: Unable to read data: com.bigdata.rwstore.PhysicalAddressResolutionException: Address did not resolve to physical address: -146023871
      

      Full log attached. This is reproducible over a number of machines so it doesn't seem to be something unique so some transient error.

      1. log
        17 kB
        stasmalyshev

        Activity

        Hide
        beebs Brad Bebee added a comment -

        thompsonbry I tried this with a clean journal on 1.5.2 and it worked properly. It is possibly a side-affect of the recycler issue in BLZG-1236?

        Show
        beebs Brad Bebee added a comment - thompsonbry I tried this with a clean journal on 1.5.2 and it worked properly. It is possibly a side-affect of the recycler issue in BLZG-1236 ?
        Hide
        stasmalyshev stasmalyshev added a comment -

        It happens on different machines, including two production ones that were installed recently (which never had namespace deletions and were just loaded fresh dump and updates). The "Address did not resolve to physical address" exception does not happen every time you try dumpPages though. But the dump shows as above - with only three indexes, one of them missing BTreePageStats - for all of them, every time.

        Show
        stasmalyshev stasmalyshev added a comment - It happens on different machines, including two production ones that were installed recently (which never had namespace deletions and were just loaded fresh dump and updates). The "Address did not resolve to physical address" exception does not happen every time you try dumpPages though. But the dump shows as above - with only three indexes, one of them missing BTreePageStats - for all of them, every time.
        Hide
        bryanthompson bryanthompson added a comment -

        Stas, is this during concurrent query or concurrent update? The dumpJournal option on the NSS might not be safe for use while reading or writing on the database. I thought the the issues with reading on the database while running DumpJournal in the NSS were resolved. But the issue with concurrent writers could easily exist. If so, the workaround is to suspend writers on the journal while running DumpJournal.

        In general, DumpJournal is not a great idea for a live server. It is ok for the light weight metadata that it reports, but using it with &dumpPages puts a lot of burden on the IO system.

        Show
        bryanthompson bryanthompson added a comment - Stas, is this during concurrent query or concurrent update? The dumpJournal option on the NSS might not be safe for use while reading or writing on the database. I thought the the issues with reading on the database while running DumpJournal in the NSS were resolved. But the issue with concurrent writers could easily exist. If so, the workaround is to suspend writers on the journal while running DumpJournal. In general, DumpJournal is not a great idea for a live server. It is ok for the light weight metadata that it reports, but using it with &dumpPages puts a lot of burden on the IO system.
        Hide
        bryanthompson bryanthompson added a comment -

        Stas, please comment if this ticket should have remained open, but I think it has been addressed in the sense that (a) the underlying issues for WDS were addressed; and (b) concurrent use of DumpJournal with updates is not supported.

        Show
        bryanthompson bryanthompson added a comment - Stas, please comment if this ticket should have remained open, but I think it has been addressed in the sense that (a) the underlying issues for WDS were addressed; and (b) concurrent use of DumpJournal with updates is not supported.
        Hide
        beebs Brad Bebee added a comment -
        Show
        beebs Brad Bebee added a comment - Documented this in the wiki: https://wiki.blazegraph.com/wiki/index.php/IOOptimization .

          People

          • Assignee:
            Unassigned
            Reporter:
            stasmalyshev stasmalyshev
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: