Details

      Description

      This is a feature request to improve the load rate into the single machine database. Currently, a single thread drives the parser and the index updates. The index updates themselves are executed against a thread pool, but the parser is not executing while the index updates are being performed.

      There are several ways in which load performance could be improved:

      1. Run a separate thread for the parser and buffer the parser output such that there is always data available for index updates.
      2. Run concurrent parser/loader tasks against the same connection. This could be done for the DataLoader, the InsertServlet, and the LOAD update operation.

      For line-oriented RDF formats and sparse matrix formats we can also break the file into blocks and assign the blocks to a thread pool. Each thread would fine the start of the next line in each block and hand off the remainder to the thread for the previous block. The threads could read the data more quickly into the intermediate format and this will decrease the time parsing versus writing the indexed data structures.

      In addition to co-threading, we could increase the data density:

      1. Pack TermIds. See [1,3].
      2. Use namespace compression [2].

      We should also provide for the dynamic extension of the Vocabulary as part of this effort. My notes on this follow: Simply update the Vocabulary in the GRS. We have the rest of the byte values and then the rest of the short values all of which can be used. If we have a vocabulary item which is already in TERM2ID, BLOBS, or inline, then we store the TermId or BlobIV in the Vocabulary. What matters is that the TermIds, BlobIVs and Vocabulary are consistent. It is not necessary that all Vocabulary items are represented by inline IVs. It is necessary that the IVCache is set and that they are immediately available from the LexiconConfiguration.

      See https://docs.google.com/document/d/1R8tWnAQUWcXl4tMszPztvHmfyWrnHHhsNbmvNBqIEWU/edit for some documentation on work in progress in the load-performance branch.

      [1] BLZG-314 (TermIdEncoder)
      [2] BLZG-629 (PartlyInlineURIs)
      [3] BLZG-654 (Pack TIDs)
      [4] BLZG-658 (Use PSOutputStream/ConstantStore for large/small blobs)
      [5] BLZG-660 (Support PSOutputStream/InputStream at IRawStore)

        Issue Links

        1. Experiment with heap size, GC mode, and nursery settings Sub-task Open Brad Bebee
         
        2.
        RDF Parser and index writers should overlap Sub-task Done bryanthompson
         
        3.
        SEARCH index is written on one time too many? Sub-task Done bryanthompson
         
        4. Reduced TERM2ID scatter induced by UUIDs and other random things in URIs. Sub-task Open bryanthompson
         
        5. Expose more parallelism in lexicon index writes Sub-task Open bryanthompson
         
        6. Expose more parallelism by overlapping ID2TERM and SEARCH index writes with statement index writes Sub-task Open bryanthompson
         
        7. Run a pool of RDF parsers that target a single loader task. Sub-task Open Unassigned
         
        8. Pack TIDs (breaks binary compatibility) Sub-task Open Unassigned
         
        9.
        Poor person's durable queues pattern for DataLoader Sub-task Done bryanthompson
         
        10.
        Add option to make the DataLoader robust to files that cause rio to throw a fatal exception Sub-task Done michaelschmidt
         
        11.
        Add DataLoader option to run DumpJournal after each batch Sub-task Done bryanthompson
         
        12.
        Add putIfAbsent pattern for conditional insert Sub-task Done bryanthompson
         
        13. Schedule more IOs when loading data. Sub-task In Progress bryanthompson
         
        14. Improve read/replace singleton property value Sub-task Open mikepersonick
         
        15. StatementBuffer must flush on SP boundary for property graphs Sub-task Accepted mikepersonick
         
        16. Option to configure sail write connection for low level statement buffer writer behavior Sub-task Open bryanthompson
         
        17. PARALLEL iterator pattern in AccessPath or IRangeQuery Sub-task Accepted bryanthompson
         
        18. Add PREFETCH option for IRangeQuery Sub-task Open martyncutcher
         
        19.
        Implement support for DTE extension types for URIs Sub-task Done mikepersonick
         
        20.
        Examine impact of dirty list threshold vs direct buffer size on write cache performance for bulk load Sub-task Done bryanthompson
         
        21.
        Modify the default behavior for setting the clear/dirty list threshold Sub-task Done michaelschmidt
         
        22. Examine whether we can parallelize the warm-up procedure Sub-task Open Unassigned
         
        23. Examine whether we can parallelize DumpJournal -pages Sub-task Open Unassigned
         
        24.
        Update DataLoader documentation on the wiki Sub-task Done maria.krokhaleva
         
        25.
        Decrease storage overhead for small raw records (ConstantAllocator) Sub-task Open martyncutcher
         
        26.
        DataLoader.Options.FLUSH does not defer flush of StatementBuffer Sub-task Done bryanthompson
         
        27.
        Concurrent modification error in load-performance branch? Sub-task Done bryanthompson
         
        28. Dynamic extension of the Vocabulary Sub-task Open Brad Bebee
         
        29. Parallelize the VocabBuilder Sub-task Open Unassigned
         
        30. Review, document, and blog the VocabBuilder Sub-task Open bradbebee
         
        31.
        Merge load-performance to master Sub-task Done Brad Bebee
         
        32.
        DataLoader should sort files within each directory to establish a stable order for file loading Sub-task Done michaelschmidt
         
        33.
        Prefix and Suffix Inline URI Handler Sub-task Done Brad Bebee
         
        34. Use a more-intelligent Inlining Strategy by Default Sub-task Open Brad Bebee
         
        35. Consider strategies for allowing concurrent writers on a BTree/HTree Sub-task Open bryanthompson
         
        36.
        Add BTreeCounters for cache hit and cache miss Sub-task Done bryanthompson
         
        37.
        Add dynamic counters for the write retention queue so we can better understand the dynamics of the eviction policy Sub-task Done Brad Bebee
         
        38.
        Reduce commit latency by parallel checkpoint by level of dirty pages in an index Sub-task Done Brad Bebee
         
        39.
        Reduce commit latency by parallelizing delete block processing Sub-task Done martyncutcher
         
        40. Improve B+Tree/HTree write retention cache eviction throughput Sub-task In Progress bryanthompson
         
        41. AbstractBTree.touch() synchronization hot spot Sub-task Reopened bryanthompson
         
        42. Concurrent writers on B+Tree (and possibly HTree) Sub-task Open bryanthompson
         
        43.
        Growth in RWStore.alloc() cumulative time (but latency looks ok) Sub-task Done martyncutcher
         
        44.
        RWStore.showAllocators() must take the allocation lock Sub-task Done martyncutcher
         
        45. Remove touch() contention Sub-task Open bryanthompson
         
        46. Relax B+Tree underflow and overflow thresholds Sub-task Open Alexandre Riazanov
         
        47. Adjust B+Tree leaf split points to chose shorter separator keys. Sub-task Open bryanthompson
         

          Activity

          Hide
          mikepersonick mikepersonick added a comment -

          It should be noted that closure times often dwarf simple load time, in some cases by a factor of 10 to 1. Focusing on closure might be the bigger win.

          Show
          mikepersonick mikepersonick added a comment - It should be noted that closure times often dwarf simple load time, in some cases by a factor of 10 to 1. Focusing on closure might be the bigger win.
          Hide
          bryanthompson bryanthompson added a comment -

          Added some more related issues to the summary for this ticket.

          Show
          bryanthompson bryanthompson added a comment - Added some more related issues to the summary for this ticket.
          Hide
          bryanthompson bryanthompson added a comment -

          Opening load-performance branch on github.

          Show
          bryanthompson bryanthompson added a comment - Opening load-performance branch on github.
          Hide
          bryanthompson bryanthompson added a comment -

          BLZG-1522 decouples the parser and index writer and allows these operations to overlap. It also merges batches to help the index writer catch up to the parser and reduce the total effort through elided lexicon access (distinct Values over the merged batches) and larger ordered writes (for Values and especially for the statement indices).

          This should boost any code that goes through BigdataSail, the DataLoader, or incremental truth maintenance.

          Show
          bryanthompson bryanthompson added a comment - BLZG-1522 decouples the parser and index writer and allows these operations to overlap. It also merges batches to help the index writer catch up to the parser and reduce the total effort through elided lexicon access (distinct Values over the merged batches) and larger ordered writes (for Values and especially for the statement indices). This should boost any code that goes through BigdataSail, the DataLoader, or incremental truth maintenance.
          Hide
          bryanthompson bryanthompson added a comment -

          The SCAN + FILTER could be accelerated using a parallel iterator. Right now it is just a sequential scan (with perhaps a producer / consumer decoupling in AccessPath). A true parallel scan would divide up the key-range among a number of threads and each thread would hand off blocks of tuples at a time to the consumer.

          Show
          bryanthompson bryanthompson added a comment - The SCAN + FILTER could be accelerated using a parallel iterator. Right now it is just a sequential scan (with perhaps a producer / consumer decoupling in AccessPath). A true parallel scan would divide up the key-range among a number of threads and each thread would hand off blocks of tuples at a time to the consumer.
          Hide
          bryanthompson bryanthompson added a comment -

          The goal with partly inline URIs is that the namespaces are automatically recognized during processing and inserted into the lexicon. Fully inline URIs with localName components that are string data or parseable as either simple or compound intrinsic literals are already possible.

          Show
          bryanthompson bryanthompson added a comment - The goal with partly inline URIs is that the namespaces are automatically recognized during processing and inserted into the lexicon. Fully inline URIs with localName components that are string data or parseable as either simple or compound intrinsic literals are already possible.
          Hide
          beebs Brad Bebee added a comment -

          Moving this to 2.1 as it is only partially included in the 2.0 release.

          Show
          beebs Brad Bebee added a comment - Moving this to 2.1 as it is only partially included in the 2.0 release.

            People

            • Assignee:
              bryanthompson bryanthompson
              Reporter:
              bryanthompson bryanthompson
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated: