Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-641 Improve load performance
  3. BLZG-1578

Reduced TERM2ID scatter induced by UUIDs and other random things in URIs.

    XMLWordPrintable

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Medium
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Tests with parallel pre-fetch on various data sets demonstrate a clear bottleneck on the TERM2ID index writes. This bottleneck arises from scatter induced in the TERM2ID index by what amounts to random numbers embedded into URIs.

      There are two ways of addressing this issue:

      1. Use an InlineURIHandler to recognize URIs fitting some pattern and inline them directly into the statement indices. This completely removes the induced burden on the TERM2ID index. While scatter will still be induced on the statement indices, the writers on the statement indices run in parallel and thus we do not get into a situation with additive latency. Support for additional intrinsic datatypes was added (BLZG-1507).
      2. Support column-wise compression (BLZG-13) in the statement indices and use a page local dictionary to convert URIs, Literals, etc. into page local integers and then layer on additional compression techniques to obtain a tightly packed page. This requires more effort and I will add some tickets to BLZG-13 for a roadmap in this direction.

      This ticket exists to:

      1. Document the TERM2ID bottleneck
      2. Provide work arounds for various data sets
      3. Document how to implement these workarounds on the wiki.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bryanthompson bryanthompson
              Reporter:
              bryanthompson bryanthompson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: