Details

    • Type: Sub-task
    • Status: Done
    • Priority: Highest
    • Resolution: Done
    • Affects Version/s: BLAZEGRAPH_RELEASE_1_5_1
    • Fix Version/s: BLAZEGRAPH_2_0_0
    • Component/s: RWStore
    • Labels:
      None

      Description

      The purpose of this ticket is to improve our understanding of the small slot optimization, any interaction with group commit, and make the small slot optimization policy less susceptible to mis-configuration.

      Martyn is going to:


      - 2x2 experimental design crossing the small slots and group commit option.
      - Extract the dumpJournal allocators and page histogram for each.
      - Validate the reported allocations in use against the allocated bits in the allocators.
      - Verify that we have a good explanation of the allocation statistics that we are observing.


      - Consider a separate allocation policy for blob headers (basically going around the small slot optimization for blob headers). However, the maximum waste policy might be sufficient without requiring this tighter coupling.


      - Consider a maximum waste policy. The small slot policy does not return an allocator to the free list unless it has a sufficient sparsity. If no allocator for a given slot size is on the free list, then a new one is allocated. The modified policy would then check the amount of "waste" (unused storage) for allocators of that slot size. If this waste exceeds a threshold (percentage of all space allocated for that size allocator), then it would scan the allocators for that slot size that are not on the free list and put the one with the most free space onto the free list. Further, if there is enough waste across the small slot allocators, then we could just fill in the next free slot rather than looking for a set of slots with good locality (e.g., a page, 1/2 page, etc.). This balances waste (store size on the disk) against locality.

      See https://docs.google.com/a/systap.com/spreadsheets/d/1AANi3aCQIOcx2nMoerecnKOgl7gtZDNLJ7fBEQfCZ2o/edit?usp=sharing for the data from our discussion of this ticket.

      The original ticket description is below.
      ----

      Andreas wrote:

      I currently updated to the current Revision (f4c63e5) of Blazegraph from Git and tried to load a dataset into the updated Webapp. With Bigdata 1.4.0 this resulted in a journal of ~18GB. Now the process was cancelled because the disk was full - the journal was beyond 50GB for the same file with the same settings.  The only exception was that I activated GroupCommit. 
      
      The dataset can be downloaded here: 
      
      http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz . 
      
      Please find the settings used to load the file below.
      
      Do I have a misconfiguration, or is there a bug eating all disk memory? 
      

      Namespace-Properties:

      curl -H "Accept: text/plain" http://localhost:8080/bigdata/namespace/gnd/properties
      
      #Wed Apr 22 11:35:31 CEST 2015
      com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
      com.bigdata.relation.container=gnd
      com.bigdata.rwstore.RWStore.smallSlotType=1024
      com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
      com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
      com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
      com.bigdata.journal.AbstractJournal.initialExtent=209715200
      com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
      com.bigdata.btree.BTree.branchingFactor=700
      com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
      com.bigdata.rdf.sail.isolatableIndices=false
      com.bigdata.service.AbstractTransactionService.minReleaseAge=1
      com.bigdata.rdf.sail.bufferCapacity=2000
      com.bigdata.rdf.sail.truthMaintenance=false
      com.bigdata.rdf.sail.namespace=gnd
      com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
      com.bigdata.rdf.store.AbstractTripleStore.quads=false
      com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
      com.bigdata.search.FullTextIndex.fieldsEnabled=false
      com.bigdata.relation.namespace=gnd
      com.bigdata.journal.Journal.groupCommit=true
      com.bigdata.btree.writeRetentionQueue.capacity=10000
      com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
      com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
      

        Issue Links

          Activity

          Hide
          bryanthompson bryanthompson added a comment -

          Ok. So if the branching factors had been set correctly then you would expect to see appropriate utilization of the small slots?

          Show
          bryanthompson bryanthompson added a comment - Ok. So if the branching factors had been set correctly then you would expect to see appropriate utilization of the small slots?
          Hide
          bryanthompson bryanthompson added a comment -

          I've created BLZG-1278 for the "maximum waste" policy. Let's discuss the results above today and then close out this ticket and schedule BLZG-1278.

          Show
          bryanthompson bryanthompson added a comment - I've created BLZG-1278 for the "maximum waste" policy. Let's discuss the results above today and then close out this ticket and schedule BLZG-1278 .
          Hide
          martyncutcher martyncutcher added a comment - - edited

          I completed a large load on Friday/Saturday. I had to abort the first load since I had not provided sufficient heap (my default is 4G), and after the journal got to around 12G the load effectively came to a halt due to GC pressure.

          I restarted the load with a 10G heap, and it ran to completion in around 5 hours.

          I ran the load with small-slots and group-commit to confirm my previous analysis on the small-slot behaviour to get some confidence that I had not missed other scale effects.

          I checked the allocation statistics intermittently through the load, here are the results:

          Reserved: 17679360, Allocations: 18297354, InUse: 13452281, SlotsUnused: 23.91%
          Reserved: 31454208, Allocations: 32423760, InUse: 21889072, SlotsUnused: 30.41%
          Reserved: 45907968, Allocations: 47119216, InUse: 31428175, SlotsUnused: 31.54%
          Reserved: 71026688, Allocations: 73180886, InUse: 45959533, SlotsUnused: 35.29%

          The final Journal extent was 20.24G.

          As predicted, the unused percentage is slowly increasing, but is still a way from the theoretical limit of 50% for the standard slot defaults.

          There was nothing unexpected in the other allocation data. With the RWStore there can be certain allocation patterns s.t. incrementally larger allocations are made, and in theory this could leave "old" allocators unused (a little like "junk DNA"). But we have not seen this before with any real-world RDF Blazegraph loads and there is no evidence of such a pattern here.

          Here is the full data for the complete load:

          Running...
          
          magic=e6b4c275
          version=1
          extent=209715200(200M), userExtent=209714512(199M), bytesAvailable=209714512(199M), nextOffset=0
          rootBlock{ rootBlock=0, challisField=2, version=3, nextOffset=214748373017, localTime=1431723087794 [Friday, 15 May 2015 21:51:27 o'clock BST], firstCommitTime=1431723087462 [Friday, 15 May 2015 21:51:27 o'clock BST], lastCommitTime=1431723087780 [Friday, 15 May 2015 21:51:27 o'clock BST], commitCounter=2, commitRecordAddr={off=NATIVE:-40966,len=422}, commitRecordIndexAddr={off=NATIVE:-8216,len=220}, blockSequence=1, quorumToken=-1, metaBitsAddr=56295948335, metaStartAddr=3200, storeType=RW, uuid=8caaaab5-cdee-4e32-aeb7-25c9319e3912, offsetBits=42, checksum=-1113971251, createTime=1431723086758 [Friday, 15 May 2015 21:51:26 o'clock BST], closeTime=0}
          rootBlock{ rootBlock=1, challisField=3, version=3, nextOffset=1206611019907076, localTime=1431766232099 [Saturday, 16 May 2015 09:50:32 o'clock BST], firstCommitTime=1431723087462 [Friday, 15 May 2015 21:51:27 o'clock BST], lastCommitTime=1431766224463 [Saturday, 16 May 2015 09:50:24 o'clock BST], commitCounter=3, commitRecordAddr={off=NATIVE:-87629828,len=422}, commitRecordIndexAddr={off=NATIVE:-87613444,len=220}, blockSequence=337384, quorumToken=-1, metaBitsAddr=1205167957475744, metaStartAddr=308853, storeType=RW, uuid=8caaaab5-cdee-4e32-aeb7-25c9319e3912, offsetBits=42, checksum=1354574001, createTime=1431723086758 [Friday, 15 May 2015 21:51:26 o'clock BST], closeTime=0}
          The current root block is #1
          
          -------------------------
          RWStore Allocator Summary
          -------------------------
          AllocatorSize      AllocatorCount   SlotsAllocated  %SlotsAllocated    SlotsRecycled        SlotChurn       SlotsInUse      %SlotsInUse   MeanAllocation    SlotsReserved     %SlotsUnused    BytesReserved     BytesAppData       %SlotWaste         %AppData       %StoreFile      %TotalWaste       %FileWaste 
          64                           9909         73180886            37.88         27221353             1.59         45959533            90.02               27         71026688            35.29       4545708032       1526154398            66.43            12.22            24.72            51.14             16.42 
          128                           432          3483693             1.80           648534             1.23          2835159             5.55               87          3086336             8.14        395051008        245273738            37.91             1.96             2.15             2.54              0.81 
          192                            59           955861             0.49           631206             2.94           324655             0.64              156           418816            22.48         80412672         51389729            36.09             0.41             0.44             0.49              0.16 
          320                            24          1288403             0.67          1197041            14.10            91362             0.18              254           159744            42.81         51118080         26405140            48.34             0.21             0.28             0.42              0.13 
          512                             9          1703731             0.88          1674357            58.00            29374             0.06              415            57600            49.00         29491200         18492931            37.29             0.15             0.16             0.19              0.06 
          768                             5          2039434             1.06          2020184           105.94            19250             0.04              637            35840            46.29         27525120         20173159            26.71             0.16             0.15             0.12              0.04 
          1024                            4          1811717             0.94          1798615           138.28            13102             0.03              893            21760            39.79         22282240         18872689            15.30             0.15             0.12             0.06              0.02 
          2048                            6          5539688             2.87          5505039           159.88            34649             0.07             1501            36096             4.01         73924608         73306024             0.84             0.59             0.40             0.01              0.00 
          3072                           22          5096809             2.64          4948080            34.27           148729             0.29             2585           157696             5.69        484442112        391602332            19.16             3.14             2.63             1.57              0.50 
          4096                           25          7458811             3.86          7289309            44.00           169502             0.33             3617           179200             5.41        734003200        661786316             9.84             5.30             3.99             1.22              0.39 
          8192                          204         90651895            46.92         89221991            63.40          1429904             2.80             6912          1458176             1.94      11945377792       9451800209            20.87            75.70            64.96            42.23             13.56 
          
          -------------------------
          BLOBS
          -------------------------
          Bucket(K)   Allocations    Allocated      Deletes      Deleted      Current         Data         Mean        Churn
          16             21167704 222066676773     21012030 220482563361       155674   1584113412        10490       135.97
          32              4880317 103063263812      4860572 102643507094        19745    419756718        21118       247.17
          64               703714  29313132475       700089  29162916598         3625    150215877        41654       194.13
          128               13387    961907669        13321    957160485           66      4747184        71853       202.83
          256                   0            0            0            0            0            0            0         0.00
          512                   0            0            0            0            0            0            0         0.00
          1024                  0            0            0            0            0            0            0         0.00
          2048                  0            0            0            0            0            0            0         0.00
          4096                  0            0            0            0            0            0            0         0.00
          8192                  0            0            0            0            0            0            0         0.00
          16384                 0            0            0            0            0            0            0         0.00
          32768                 0            0            0            0            0            0            0         0.00
          65536                 0            0            0            0            0            0            0         0.00
          2097151               0            0            0            0            0            0            0         0.00
          
          Checking regions.....okay
          
          There are 2 commit points.
          CommitRecord{timestamp=1431766224463, commitCounter=3, roots=[-376296872374959908, -376367241119137452, -376332056747048884, -140754668224381, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}
          name=__globalRowStore
          	Checkpoint{indexType=BTree,height=1,nnodes=1,nleaves=4,nentries=75,counter=0,addrRoot=-140758963191632,addrMetadata=-70385924045614,addrBloomFilter=0,addrCheckpoint=-316672233701156}
          	addrMetadata=0, name=__globalRowStore, indexType=BTree, indexUUID=1c9bb12d-67b5-46ea-b65c-679fca1653d1, branchingFactor=32, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@6233839c{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.btree.DefaultTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=null, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@2405a922{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.CanonicalHuffmanRabaCoder@6fa309a7}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=com.bigdata.sparse.LogicalRowSplitHandler@38ceec33, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@18f52568
          name=BSBM_284826.lex.BLOBS
          	Checkpoint{indexType=BTree,height=1,nnodes=1,nleaves=64,nentries=34552,counter=0,addrRoot=-307428927209471120,addrMetadata=-21474835632,addrBloomFilter=0,addrCheckpoint=-367720376735629092}
          	addrMetadata=0, name=BSBM_284826.lex.BLOBS, indexType=BTree, indexUUID=675abe07-3e9e-4233-9874-e7cd783f84df, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@517a30ed{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.BlobsTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=8}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@2d224165{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@23f36509}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=true, maxRecLen=0, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@406e444f
          name=BSBM_284826.lex.ID2TERM
          	Checkpoint{indexType=BTree,height=2,nnodes=519,nleaves=182071,nentries=63725152,counter=0,addrRoot=-376123871092269045,addrMetadata=-17179868332,addrBloomFilter=0,addrCheckpoint=-367714943601999652}
          	addrMetadata=0, name=BSBM_284826.lex.ID2TERM, indexType=BTree, indexUUID=6e4fb718-403e-4717-9a6b-0346b531a05e, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@35cbd2ee{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.Id2TermTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=9}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@588522d9{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@cb301f6}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=true, maxRecLen=16, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@3c45c3fd
          name=BSBM_284826.lex.search
          	Checkpoint{indexType=BTree,height=2,nnodes=678,nleaves=302104,nentries=122457004,counter=0,addrRoot=-376146080368148097,addrMetadata=-70398808947425,addrBloomFilter=0,addrCheckpoint=-376156796311568164}
          	addrMetadata=0, name=BSBM_284826.lex.search, indexType=BTree, indexUUID=8805f6c0-84c8-4c3a-ac57-8498cd5c165d, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@39806c11{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.RDFFullTextIndexTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=Primary, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@1e36ed95{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@2827e0b2}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@5210dab6
          name=BSBM_284826.lex.TERM2ID
          	Checkpoint{indexType=BTree,height=2,nnodes=376,nleaves=155311,nentries=63725152,counter=76763160,addrRoot=-376144203467447975,addrMetadata=-70394513980186,addrBloomFilter=0,addrCheckpoint=-376156658872614692}
          	addrMetadata=0, name=BSBM_284826.lex.TERM2ID, indexType=BTree, indexUUID=187b0189-4ea1-4c04-b609-bc9cf257d124, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@525a96a4{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.Term2IdTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=null, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@39581339{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.FixedLengthValueRabaCoder@45562098}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@1975feb8
          name=BSBM_284826.spo.JUST
          	Checkpoint{indexType=BTree,height=0,nnodes=0,nleaves=1,nentries=0,counter=0,addrRoot=-105600360906722,addrMetadata=-38654704872,addrBloomFilter=0,addrCheckpoint=-35278861369124}
          	addrMetadata=0, name=BSBM_284826.spo.JUST, indexType=BTree, indexUUID=86b98e31-b079-41c7-8168-a77405f1f244, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@24f3d80e{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.JustificationTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@4ed9216{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.EmptyRabaValueCoder@911e4e1}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@75c9746d
          name=BSBM_284826.spo.OSP
          	Checkpoint{indexType=BTree,height=3,nnodes=822,nleaves=321530,nentries=124368677,counter=0,addrRoot=-375678306890022841,addrMetadata=-30064770300,addrBloomFilter=0,addrCheckpoint=-376156727592091428}
          	addrMetadata=0, name=BSBM_284826.spo.OSP, indexType=BTree, indexUUID=7f249e55-8503-4b54-af8a-a7e6ba004bf6, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@39a39e36{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@24e808ca{ratio=8}, leafValuesCoder=com.bigdata.rdf.spo.FastRDFValueCoder2@7d9918b3}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@4e24d030
          name=BSBM_284826.spo.POS
          	Checkpoint{indexType=BTree,height=3,nnodes=835,nleaves=326155,nentries=124368677,counter=0,addrRoot=-375678324069892017,addrMetadata=-34359737596,addrBloomFilter=0,addrCheckpoint=-376156757656862500}
          	addrMetadata=0, name=BSBM_284826.spo.POS, indexType=BTree, indexUUID=a0e392a3-343d-408e-b40e-7664e6c47459, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@7a3a40bd{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@67076fc{ratio=8}, leafValuesCoder=com.bigdata.rdf.spo.FastRDFValueCoder2@41939db1}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@4255f119
          name=BSBM_284826.spo.SPO
          	Checkpoint{indexType=BTree,height=3,nnodes=989,nleaves=349324,nentries=124368677,counter=0,addrRoot=-375678328364859311,addrMetadata=-25769802906,addrBloomFilter=0,addrCheckpoint=-376156787721633572}
          	addrMetadata=0, name=BSBM_284826.spo.SPO, indexType=BTree, indexUUID=18c6c6db-a934-4d80-8a00-aeb05c20550d, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@f2766e7{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@63d4cf76{ratio=8}, leafValuesCoder=com.bigdata.rdf.spo.FastRDFValueCoder2@28807f2e}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=BloomFilterFactory{ n=1000000, p=0.02, maxP=0.15, maxN=1883227}, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0}
          	com.bigdata.btree.BaseIndexStats@c1db9c8
          name	indexType	m	height	nnodes	nleaves	nentries
          BSBM_284826.lex.BLOBS	BTree	700	1	1	64	34552
          BSBM_284826.lex.ID2TERM	BTree	700	2	519	182071	63725152
          BSBM_284826.lex.TERM2ID	BTree	700	2	376	155311	63725152
          BSBM_284826.lex.search	BTree	700	2	678	302104	122457004
          BSBM_284826.spo.JUST	BTree	700	0	0	1	0
          BSBM_284826.spo.OSP	BTree	700	3	822	321530	124368677
          BSBM_284826.spo.POS	BTree	700	3	835	326155	124368677
          BSBM_284826.spo.SPO	BTree	700	3	989	349324	124368677
          __globalRowStore	BTree	32	1	1	4	75
          
          Build Version=1.5.1
          
          
          Show
          martyncutcher martyncutcher added a comment - - edited I completed a large load on Friday/Saturday. I had to abort the first load since I had not provided sufficient heap (my default is 4G), and after the journal got to around 12G the load effectively came to a halt due to GC pressure. I restarted the load with a 10G heap, and it ran to completion in around 5 hours. I ran the load with small-slots and group-commit to confirm my previous analysis on the small-slot behaviour to get some confidence that I had not missed other scale effects. I checked the allocation statistics intermittently through the load, here are the results: Reserved: 17679360, Allocations: 18297354, InUse: 13452281, SlotsUnused: 23.91% Reserved: 31454208, Allocations: 32423760, InUse: 21889072, SlotsUnused: 30.41% Reserved: 45907968, Allocations: 47119216, InUse: 31428175, SlotsUnused: 31.54% Reserved: 71026688, Allocations: 73180886, InUse: 45959533, SlotsUnused: 35.29% The final Journal extent was 20.24G. As predicted, the unused percentage is slowly increasing, but is still a way from the theoretical limit of 50% for the standard slot defaults. There was nothing unexpected in the other allocation data. With the RWStore there can be certain allocation patterns s.t. incrementally larger allocations are made, and in theory this could leave "old" allocators unused (a little like "junk DNA"). But we have not seen this before with any real-world RDF Blazegraph loads and there is no evidence of such a pattern here. Here is the full data for the complete load: Running... magic=e6b4c275 version=1 extent=209715200(200M), userExtent=209714512(199M), bytesAvailable=209714512(199M), nextOffset=0 rootBlock{ rootBlock=0, challisField=2, version=3, nextOffset=214748373017, localTime=1431723087794 [Friday, 15 May 2015 21:51:27 o'clock BST], firstCommitTime=1431723087462 [Friday, 15 May 2015 21:51:27 o'clock BST], lastCommitTime=1431723087780 [Friday, 15 May 2015 21:51:27 o'clock BST], commitCounter=2, commitRecordAddr={off=NATIVE:-40966,len=422}, commitRecordIndexAddr={off=NATIVE:-8216,len=220}, blockSequence=1, quorumToken=-1, metaBitsAddr=56295948335, metaStartAddr=3200, storeType=RW, uuid=8caaaab5-cdee-4e32-aeb7-25c9319e3912, offsetBits=42, checksum=-1113971251, createTime=1431723086758 [Friday, 15 May 2015 21:51:26 o'clock BST], closeTime=0} rootBlock{ rootBlock=1, challisField=3, version=3, nextOffset=1206611019907076, localTime=1431766232099 [Saturday, 16 May 2015 09:50:32 o'clock BST], firstCommitTime=1431723087462 [Friday, 15 May 2015 21:51:27 o'clock BST], lastCommitTime=1431766224463 [Saturday, 16 May 2015 09:50:24 o'clock BST], commitCounter=3, commitRecordAddr={off=NATIVE:-87629828,len=422}, commitRecordIndexAddr={off=NATIVE:-87613444,len=220}, blockSequence=337384, quorumToken=-1, metaBitsAddr=1205167957475744, metaStartAddr=308853, storeType=RW, uuid=8caaaab5-cdee-4e32-aeb7-25c9319e3912, offsetBits=42, checksum=1354574001, createTime=1431723086758 [Friday, 15 May 2015 21:51:26 o'clock BST], closeTime=0} The current root block is #1 ------------------------- RWStore Allocator Summary ------------------------- AllocatorSize AllocatorCount SlotsAllocated %SlotsAllocated SlotsRecycled SlotChurn SlotsInUse %SlotsInUse MeanAllocation SlotsReserved %SlotsUnused BytesReserved BytesAppData %SlotWaste %AppData %StoreFile %TotalWaste %FileWaste 64 9909 73180886 37.88 27221353 1.59 45959533 90.02 27 71026688 35.29 4545708032 1526154398 66.43 12.22 24.72 51.14 16.42 128 432 3483693 1.80 648534 1.23 2835159 5.55 87 3086336 8.14 395051008 245273738 37.91 1.96 2.15 2.54 0.81 192 59 955861 0.49 631206 2.94 324655 0.64 156 418816 22.48 80412672 51389729 36.09 0.41 0.44 0.49 0.16 320 24 1288403 0.67 1197041 14.10 91362 0.18 254 159744 42.81 51118080 26405140 48.34 0.21 0.28 0.42 0.13 512 9 1703731 0.88 1674357 58.00 29374 0.06 415 57600 49.00 29491200 18492931 37.29 0.15 0.16 0.19 0.06 768 5 2039434 1.06 2020184 105.94 19250 0.04 637 35840 46.29 27525120 20173159 26.71 0.16 0.15 0.12 0.04 1024 4 1811717 0.94 1798615 138.28 13102 0.03 893 21760 39.79 22282240 18872689 15.30 0.15 0.12 0.06 0.02 2048 6 5539688 2.87 5505039 159.88 34649 0.07 1501 36096 4.01 73924608 73306024 0.84 0.59 0.40 0.01 0.00 3072 22 5096809 2.64 4948080 34.27 148729 0.29 2585 157696 5.69 484442112 391602332 19.16 3.14 2.63 1.57 0.50 4096 25 7458811 3.86 7289309 44.00 169502 0.33 3617 179200 5.41 734003200 661786316 9.84 5.30 3.99 1.22 0.39 8192 204 90651895 46.92 89221991 63.40 1429904 2.80 6912 1458176 1.94 11945377792 9451800209 20.87 75.70 64.96 42.23 13.56 ------------------------- BLOBS ------------------------- Bucket(K) Allocations Allocated Deletes Deleted Current Data Mean Churn 16 21167704 222066676773 21012030 220482563361 155674 1584113412 10490 135.97 32 4880317 103063263812 4860572 102643507094 19745 419756718 21118 247.17 64 703714 29313132475 700089 29162916598 3625 150215877 41654 194.13 128 13387 961907669 13321 957160485 66 4747184 71853 202.83 256 0 0 0 0 0 0 0 0.00 512 0 0 0 0 0 0 0 0.00 1024 0 0 0 0 0 0 0 0.00 2048 0 0 0 0 0 0 0 0.00 4096 0 0 0 0 0 0 0 0.00 8192 0 0 0 0 0 0 0 0.00 16384 0 0 0 0 0 0 0 0.00 32768 0 0 0 0 0 0 0 0.00 65536 0 0 0 0 0 0 0 0.00 2097151 0 0 0 0 0 0 0 0.00 Checking regions.....okay There are 2 commit points. CommitRecord{timestamp=1431766224463, commitCounter=3, roots=[-376296872374959908, -376367241119137452, -376332056747048884, -140754668224381, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]} name=__globalRowStore Checkpoint{indexType=BTree,height=1,nnodes=1,nleaves=4,nentries=75,counter=0,addrRoot=-140758963191632,addrMetadata=-70385924045614,addrBloomFilter=0,addrCheckpoint=-316672233701156} addrMetadata=0, name=__globalRowStore, indexType=BTree, indexUUID=1c9bb12d-67b5-46ea-b65c-679fca1653d1, branchingFactor=32, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@6233839c{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.btree.DefaultTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=null, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@2405a922{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.CanonicalHuffmanRabaCoder@6fa309a7}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=com.bigdata.sparse.LogicalRowSplitHandler@38ceec33, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@18f52568 name=BSBM_284826.lex.BLOBS Checkpoint{indexType=BTree,height=1,nnodes=1,nleaves=64,nentries=34552,counter=0,addrRoot=-307428927209471120,addrMetadata=-21474835632,addrBloomFilter=0,addrCheckpoint=-367720376735629092} addrMetadata=0, name=BSBM_284826.lex.BLOBS, indexType=BTree, indexUUID=675abe07-3e9e-4233-9874-e7cd783f84df, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@517a30ed{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.BlobsTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=8}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@2d224165{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@23f36509}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=true, maxRecLen=0, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@406e444f name=BSBM_284826.lex.ID2TERM Checkpoint{indexType=BTree,height=2,nnodes=519,nleaves=182071,nentries=63725152,counter=0,addrRoot=-376123871092269045,addrMetadata=-17179868332,addrBloomFilter=0,addrCheckpoint=-367714943601999652} addrMetadata=0, name=BSBM_284826.lex.ID2TERM, indexType=BTree, indexUUID=6e4fb718-403e-4717-9a6b-0346b531a05e, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@35cbd2ee{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.Id2TermTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=9}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@588522d9{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@cb301f6}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=true, maxRecLen=16, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@3c45c3fd name=BSBM_284826.lex.search Checkpoint{indexType=BTree,height=2,nnodes=678,nleaves=302104,nentries=122457004,counter=0,addrRoot=-376146080368148097,addrMetadata=-70398808947425,addrBloomFilter=0,addrCheckpoint=-376156796311568164} addrMetadata=0, name=BSBM_284826.lex.search, indexType=BTree, indexUUID=8805f6c0-84c8-4c3a-ac57-8498cd5c165d, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@39806c11{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.RDFFullTextIndexTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=Primary, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@1e36ed95{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.SimpleRabaCoder@2827e0b2}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@5210dab6 name=BSBM_284826.lex.TERM2ID Checkpoint{indexType=BTree,height=2,nnodes=376,nleaves=155311,nentries=63725152,counter=76763160,addrRoot=-376144203467447975,addrMetadata=-70394513980186,addrBloomFilter=0,addrCheckpoint=-376156658872614692} addrMetadata=0, name=BSBM_284826.lex.TERM2ID, indexType=BTree, indexUUID=187b0189-4ea1-4c04-b609-bc9cf257d124, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@525a96a4{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.lexicon.Term2IdTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.DefaultKeyBuilderFactory{ initialCapacity=0, collator=ICU, locale=en_US, strength=null, decomposition=null}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@39581339{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.FixedLengthValueRabaCoder@45562098}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@1975feb8 name=BSBM_284826.spo.JUST Checkpoint{indexType=BTree,height=0,nnodes=0,nleaves=1,nentries=0,counter=0,addrRoot=-105600360906722,addrMetadata=-38654704872,addrBloomFilter=0,addrCheckpoint=-35278861369124} addrMetadata=0, name=BSBM_284826.spo.JUST, indexType=BTree, indexUUID=86b98e31-b079-41c7-8168-a77405f1f244, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@24f3d80e{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.JustificationTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@4ed9216{ratio=8}, leafValuesCoder=com.bigdata.btree.raba.codec.EmptyRabaValueCoder@911e4e1}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@75c9746d name=BSBM_284826.spo.OSP Checkpoint{indexType=BTree,height=3,nnodes=822,nleaves=321530,nentries=124368677,counter=0,addrRoot=-375678306890022841,addrMetadata=-30064770300,addrBloomFilter=0,addrCheckpoint=-376156727592091428} addrMetadata=0, name=BSBM_284826.spo.OSP, indexType=BTree, indexUUID=7f249e55-8503-4b54-af8a-a7e6ba004bf6, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@39a39e36{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@24e808ca{ratio=8}, leafValuesCoder=com.bigdata.rdf.spo.FastRDFValueCoder2@7d9918b3}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@4e24d030 name=BSBM_284826.spo.POS Checkpoint{indexType=BTree,height=3,nnodes=835,nleaves=326155,nentries=124368677,counter=0,addrRoot=-375678324069892017,addrMetadata=-34359737596,addrBloomFilter=0,addrCheckpoint=-376156757656862500} addrMetadata=0, name=BSBM_284826.spo.POS, indexType=BTree, indexUUID=a0e392a3-343d-408e-b40e-7664e6c47459, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@7a3a40bd{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@67076fc{ratio=8}, leafValuesCoder=com.bigdata.rdf.spo.FastRDFValueCoder2@41939db1}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=N/A, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@4255f119 name=BSBM_284826.spo.SPO Checkpoint{indexType=BTree,height=3,nnodes=989,nleaves=349324,nentries=124368677,counter=0,addrRoot=-375678328364859311,addrMetadata=-25769802906,addrBloomFilter=0,addrCheckpoint=-376156787721633572} addrMetadata=0, name=BSBM_284826.spo.SPO, indexType=BTree, indexUUID=18c6c6db-a934-4d80-8a00-aeb05c20550d, branchingFactor=700, pmd=null, btreeClassName=com.bigdata.btree.BTree, checkpointClass=com.bigdata.btree.Checkpoint, nodeKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@f2766e7{ratio=8}, btreeRecordCompressorFactory=N/A, tupleSerializer=com.bigdata.rdf.spo.SPOTupleSerializer{, keyBuilderFactory=com.bigdata.btree.keys.ASCIIKeyBuilderFactory{ initialCapacity=0}, leafKeysCoder=com.bigdata.btree.raba.codec.FrontCodedRabaCoder$DefaultFrontCodedRabaCoder@63d4cf76{ratio=8}, leafValuesCoder=com.bigdata.rdf.spo.FastRDFValueCoder2@28807f2e}, conflictResolver=N/A, deleteMarkers=false, versionTimestamps=false, versionTimestampFilters=false, isolatable=false, rawRecords=false, maxRecLen=256, bloomFilterFactory=BloomFilterFactory{ n=1000000, p=0.02, maxP=0.15, maxN=1883227}, overflowHandler=N/A, splitHandler=N/A, indexSegmentBranchingFactor=512, indexSegmentBufferNodes=false, indexSegmentRecordCompressorFactory=N/A, asynchronousIndexWriteConfiguration=com.bigdata.btree.AsynchronousIndexWriteConfiguration{ masterQueueCapacity=5000, masterChunkSize=10000, masterChunkTimeoutNanos=50000000, sinkIdleTimeoutNanos=9223372036854775807, sinkPollTimeoutNanos=50000000, sinkQueueCapacity=5000, sinkChunkSize=10000, sinkChunkTimeoutNanos=9223372036854775807}, scatterSplitConfiguration=com.bigdata.btree.ScatterSplitConfiguration{enabled=true, percentOfSplitThreshold=0.25, dataServiceCount=0, indexPartitionCount=0} com.bigdata.btree.BaseIndexStats@c1db9c8 name indexType m height nnodes nleaves nentries BSBM_284826.lex.BLOBS BTree 700 1 1 64 34552 BSBM_284826.lex.ID2TERM BTree 700 2 519 182071 63725152 BSBM_284826.lex.TERM2ID BTree 700 2 376 155311 63725152 BSBM_284826.lex.search BTree 700 2 678 302104 122457004 BSBM_284826.spo.JUST BTree 700 0 0 1 0 BSBM_284826.spo.OSP BTree 700 3 822 321530 124368677 BSBM_284826.spo.POS BTree 700 3 835 326155 124368677 BSBM_284826.spo.SPO BTree 700 3 989 349324 124368677 __globalRowStore BTree 32 1 1 4 75 Build Version=1.5.1
          Hide
          bryanthompson bryanthompson added a comment -

          martyncutcher Please elevate the priority for this ticket per our conversation.

          Show
          bryanthompson bryanthompson added a comment - martyncutcher Please elevate the priority for this ticket per our conversation.
          Hide
          bryanthompson bryanthompson added a comment - - edited

          Code review on https://github.com/SYSTAP/bigdata/pull/170

          • We should be consistent and set the thresholds in terms of the percentage full for an allocator for the high and low slot waste parameters.
          • Link new logic when free list is empty to this ticket.
          • Using incorrect metric for deciding the amount of waste in a slot (and the metric that it is using is based on bad data - See BLZG-1551).
          • Various problems with storage statistics - see BLZG-1551.
          • Added javadoc around storage statistics and reconciled with comments.

          Pushed changes to github branch BLZG-1246 in commit 395cd7e7d0c78aa4857bc55ab76d5e453dc82fc7

          martyncutcher

          Show
          bryanthompson bryanthompson added a comment - - edited Code review on https://github.com/SYSTAP/bigdata/pull/170 We should be consistent and set the thresholds in terms of the percentage full for an allocator for the high and low slot waste parameters. Link new logic when free list is empty to this ticket. Using incorrect metric for deciding the amount of waste in a slot (and the metric that it is using is based on bad data - See BLZG-1551 ). Various problems with storage statistics - see BLZG-1551 . Added javadoc around storage statistics and reconciled with comments. Pushed changes to github branch BLZG-1246 in commit 395cd7e7d0c78aa4857bc55ab76d5e453dc82fc7 martyncutcher

            People

            • Assignee:
              martyncutcher martyncutcher
              Reporter:
              bryanthompson bryanthompson
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: