Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1575 Small slot optimization meta-ticket
  3. BLZG-201

Examine interaction of small slots optimization and group commit.

    XMLWordPrintable

    Details

    • Type: Sub-task
    • Status: Done
    • Priority: Highest
    • Resolution: Done
    • Affects Version/s: BLAZEGRAPH_RELEASE_1_5_1
    • Fix Version/s: BLAZEGRAPH_2_0_0
    • Component/s: RWStore
    • Labels:
      None

      Description

      The purpose of this ticket is to improve our understanding of the small slot optimization, any interaction with group commit, and make the small slot optimization policy less susceptible to mis-configuration.

      Martyn is going to:


      - 2x2 experimental design crossing the small slots and group commit option.
      - Extract the dumpJournal allocators and page histogram for each.
      - Validate the reported allocations in use against the allocated bits in the allocators.
      - Verify that we have a good explanation of the allocation statistics that we are observing.


      - Consider a separate allocation policy for blob headers (basically going around the small slot optimization for blob headers). However, the maximum waste policy might be sufficient without requiring this tighter coupling.


      - Consider a maximum waste policy. The small slot policy does not return an allocator to the free list unless it has a sufficient sparsity. If no allocator for a given slot size is on the free list, then a new one is allocated. The modified policy would then check the amount of "waste" (unused storage) for allocators of that slot size. If this waste exceeds a threshold (percentage of all space allocated for that size allocator), then it would scan the allocators for that slot size that are not on the free list and put the one with the most free space onto the free list. Further, if there is enough waste across the small slot allocators, then we could just fill in the next free slot rather than looking for a set of slots with good locality (e.g., a page, 1/2 page, etc.). This balances waste (store size on the disk) against locality.

      See https://docs.google.com/a/systap.com/spreadsheets/d/1AANi3aCQIOcx2nMoerecnKOgl7gtZDNLJ7fBEQfCZ2o/edit?usp=sharing for the data from our discussion of this ticket.

      The original ticket description is below.
      ----

      Andreas wrote:

      I currently updated to the current Revision (f4c63e5) of Blazegraph from Git and tried to load a dataset into the updated Webapp. With Bigdata 1.4.0 this resulted in a journal of ~18GB. Now the process was cancelled because the disk was full - the journal was beyond 50GB for the same file with the same settings.  The only exception was that I activated GroupCommit. 
      
      The dataset can be downloaded here: 
      
      http://datendienst.dnb.de/cgi-bin/mabit.pl?cmd=fetch&userID=opendata&pass=opendata&mabheft=GND.rdf.gz . 
      
      Please find the settings used to load the file below.
      
      Do I have a misconfiguration, or is there a bug eating all disk memory? 
      

      Namespace-Properties:

      curl -H "Accept: text/plain" http://localhost:8080/bigdata/namespace/gnd/properties
      
      #Wed Apr 22 11:35:31 CEST 2015
      com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=700
      com.bigdata.relation.container=gnd
      com.bigdata.rwstore.RWStore.smallSlotType=1024
      com.bigdata.journal.AbstractJournal.bufferMode=DiskRW
      com.bigdata.journal.AbstractJournal.file=/var/lib/bigdata/bigdata.jnl
      com.bigdata.rdf.store.AbstractTripleStore.vocabularyClass=de.bsb_muenchen.bigdata.vocab.B3KatVocabulary
      com.bigdata.journal.AbstractJournal.initialExtent=209715200
      com.bigdata.rdf.store.AbstractTripleStore.textIndex=true
      com.bigdata.btree.BTree.branchingFactor=700
      com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms
      com.bigdata.rdf.sail.isolatableIndices=false
      com.bigdata.service.AbstractTransactionService.minReleaseAge=1
      com.bigdata.rdf.sail.bufferCapacity=2000
      com.bigdata.rdf.sail.truthMaintenance=false
      com.bigdata.rdf.sail.namespace=gnd
      com.bigdata.relation.class=com.bigdata.rdf.store.LocalTripleStore
      com.bigdata.rdf.store.AbstractTripleStore.quads=false
      com.bigdata.journal.AbstractJournal.writeCacheBufferCount=500
      com.bigdata.search.FullTextIndex.fieldsEnabled=false
      com.bigdata.relation.namespace=gnd
      com.bigdata.journal.Journal.groupCommit=true
      com.bigdata.btree.writeRetentionQueue.capacity=10000
      com.bigdata.rdf.sail.BigdataSail.bufferCapacity=2000
      com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              martyncutcher martyncutcher
              Reporter:
              bryanthompson bryanthompson
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: