Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-442

int32 entryCount and childEntryCount limit data scale to 2B triples

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Affects Version/s: QUADS_QUERY_BRANCH
    • Fix Version/s: None
    • Component/s: B+Tree

      Description

      The BTree class uses int32 counters to track the #of tuples spanned by a Node and to track the total #of tuples in an index. Scale-out uses int64 longs and aggregates those int32 counters across multiple shards. The WORM architecture was never capable of indexing 4B triples due to issues with file growth and the path to very large data sets was to use scale-out (key-range partitioned shards).

      However, the RWStore now let's us scale to well above 4B triples in a single file. This is causing the int32 entryCount and childEntryCount values in the BTree and Node classes to wrap and resulting in negative counter values. While data load works fine, query relies on the fast range count estimates and those are wrong (negative).

      Modify the BTree, Node, Checkpoint and related persistent data records and APIs to use long integers for the entry counts. This will of course cause a binary compatibility issue, but I will see what I can do about supporting read of the old versions and write of the new versions.

      A sample of the DumpJournal output demonstrating the problem follows:

      magic=e6b4c275
      version=1
      rootBlock{ rootBlock=0, challisField=16, version=2, nextOffset=14634813369063026, localTime=1304827787208 [Saturday, May 7, 2011 10:09:47 PM MDT], firstCommitTime=1304747355650 [Friday, May 6, 2011 11:49:15 PM MDT], lastCommitTime=1304827776897 [Saturday, May 7, 2011 10:09:36 PM MDT], commitCounter=16, commitRecordAddr={off=-118583594,len=422}, commitRecordIndexAddr={off=-8340,len=120}, quorumToken=0, metaBitsAddr=8720982365700897, metaStartAddr=3464302, storeType=RW, uuid=45e6f491-f85a-4397-b108-3348c5fbccd5, offsetBits=42, checksum=-1070979786, createTime=1304747354484 [Friday, May 6, 2011 11:49:14 PM MDT], closeTime=0} rootBlock{ rootBlock=1, challisField=17, version=2, nextOffset=31944755824370850, localTime=1304922398549 [Monday, May 9, 2011 12:26:38 AM MDT], firstCommitTime=1304747355650 [Friday, May 6, 2011 11:49:15 PM MDT], lastCommitTime=1304922384215 [Monday, May 9, 2011 12:26:24 AM MDT], commitCounter=17, commitRecordAddr={off=-257484304,len=422}, commitRecordIndexAddr={off=-8350,len=120}, quorumToken=0, metaBitsAddr=31941246041720269, metaStartAddr=7439757, storeType=RW, uuid=45e6f491-f85a-4397-b108-3348c5fbccd5, offsetBits=42, checksum=-1751637349, createTime=1304747354484 [Friday, May 6, 2011 11:49:14 PM MDT], closeTime=0} The current root block is BLZG-208 extent=487571914752(464984M), userExtent=487571914064(464984M), bytesAvailable=-31944268252456786(-30464428188M), nextOffset=31944755824370850 CommitRecord{timestamp=1304922384215, commitCounter=17, roots=[-35858681954184, -1105886660618354348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]} name=__globalRowStore, addr={off=-8209,len=120}
              Checkpoint{indexType=BTree,height=1,nnodes=1,nleaves=5,nentries=86,counter=0,addrRoot=-211119117434657,addrMetadata=-70385924045662,addrBloomFilter=0,addrCheckpoint=-35257386532744}
              addrMetadata=0, name=__globalRowStore, indexUUID=93653f6e-482e-402f-979e-92a5fe6c0109, branchingFactor=32, pmd=null, class=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.btree.DefaultTupleSerializer, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=com.bigdata.sparse.LogicalRowSplitHandler@4e7958, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} name=kabob.lex.ID2TERM, addr={off=-8341,len=120}
              Checkpoint{indexType=BTree,height=4,nnodes=291645,nleaves=18665502,nentries=1194592194,counter=0,addrRoot=-1104019917507656467,addrMetadata=-17179868350,addrBloomFilter=0,addrCheckpoint=-35824322215816}
              addrMetadata=0, name=kabob.lex.ID2TERM, indexUUID=9a81205e-fbe4-4a55-b49c-f1c36322fa94, branchingFactor=128, pmd=null, class=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.Id2TermTupleSerializer, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} name=kabob.lex.TERM2ID, addr={off=-8342,len=120}
              Checkpoint{indexType=BTree,height=4,nnodes=256777,nleaves=17529135,nentries=1194592194,counter=1194592194,addrRoot=-509212241736758862,addrMetadata=-70390219012951,addrBloomFilter=0,addrCheckpoint=-35828617183112}
              addrMetadata=0, name=kabob.lex.TERM2ID, indexUUID=3740d602-ed7b-4a6b-9231-1b7265738717, branchingFactor=128, pmd=null, class=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.Term2IdTupleSerializer, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} name=kabob.spo.OSP, addr={off=-8344,len=120}
              Checkpoint{indexType=BTree,height=5,nnodes=921069,nleaves=60053959,nentries=-389400251,counter=0,addrRoot=-35832912150459,addrMetadata=-140754668223753,addrBloomFilter=0,addrCheckpoint=-35837207117704}
              addrMetadata=0, name=kabob.spo.OSP, indexUUID=b37c2904-cccd-4b55-88e2-1b3de9d966de, branchingFactor=128, pmd=null, class=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} name=kabob.spo.POS, addr={off=-8346,len=120}
              Checkpoint{indexType=BTree,height=5,nnodes=920938,nleaves=60054162,nentries=-389400251,counter=0,addrRoot=-35841502085042,addrMetadata=-140758963191049,addrBloomFilter=0,addrCheckpoint=-35845797052296}
              addrMetadata=0, name=kabob.spo.POS, indexUUID=025a0351-d8f3-4bb3-a3b6-94fd9fbda521, branchingFactor=128, pmd=null, class=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} name=kabob.spo.SPO, addr={off=-8348,len=120}
              Checkpoint{indexType=BTree,height=5,nnodes=953130,nleaves=61000363,nentries=-389400251,counter=0,addrRoot=-35850092019618,addrMetadata=-140750373256457,addrBloomFilter=0,addrCheckpoint=-35854386986888}
              addrMetadata=0, name=kabob.spo.SPO, indexUUID=b1d00a5d-c9f9-417c-95ac-4c625b2357fe, branchingFactor=128, pmd=null, class=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
      

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        CI is good on the branch (no deviation from the quads query branch).

        Show
        bryanthompson bryanthompson added a comment - CI is good on the branch (no deviation from the quads query branch).
        Hide
        bryanthompson bryanthompson added a comment -

        Martyn and I looked over the record compression issue today. It's a bit more complicated than we had hoped since there will be interactions with the WriteCacheService (since it is checksum aware). Therefore, we are not going to tackle either record compression or the use of native RWStore addresses in the B+Tree in this issue. However, modifying the B+Tree data records was relatively painless so we should be able to do this optimization (32-bit addresses) in the future in a backward compatible manner.

        Show
        bryanthompson bryanthompson added a comment - Martyn and I looked over the record compression issue today. It's a bit more complicated than we had hoped since there will be interactions with the WriteCacheService (since it is checksum aware). Therefore, we are not going to tackle either record compression or the use of native RWStore addresses in the B+Tree in this issue. However, modifying the B+Tree data records was relatively painless so we should be able to do this optimization (32-bit addresses) in the future in a backward compatible manner.
        Hide
        bryanthompson bryanthompson added a comment -

        Modified updateEntryCounts() to accept a long delta argument rather than an int. This could have caused a problem when removing tuples from a large index forces the merge and join of b+tree nodes.

        Added more tests at runtime to fail fast if the entry counts go negative.

        Show
        bryanthompson bryanthompson added a comment - Modified updateEntryCounts() to accept a long delta argument rather than an int. This could have caused a problem when removing tuples from a large index forces the merge and join of b+tree nodes. Added more tests at runtime to fail fast if the entry counts go negative.
        Hide
        bryanthompson bryanthompson added a comment -

        We have confirmation that the range counts are correctly maintained beyond 2^31-1. I am going to merge the INT64_BRANCH back to the QUADS_QUERY_BRANCH and then close out this issue.

        Show
        bryanthompson bryanthompson added a comment - We have confirmation that the range counts are correctly maintained beyond 2^31-1. I am going to merge the INT64_BRANCH back to the QUADS_QUERY_BRANCH and then close out this issue.
        Hide
        bryanthompson bryanthompson added a comment -

        Merged INT64_BRANCH to QUADS_QUERY_BRANCH [r4485:r4522]. Conflicts: 1 (htree.xls). Added: 2 Updated: 85.

        The htree.xls conflict was resolved by accepting the incoming version. I am working on that file and I will handle the conflict when I merge this update back into my working copy.

        The branch is closed.

        Committed revision r4523.

        Show
        bryanthompson bryanthompson added a comment - Merged INT64_BRANCH to QUADS_QUERY_BRANCH [r4485:r4522] . Conflicts: 1 (htree.xls). Added: 2 Updated: 85. The htree.xls conflict was resolved by accepting the incoming version. I am working on that file and I will handle the conflict when I merge this update back into my working copy. The branch is closed. Committed revision r4523.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: