Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-227

Add hash-based cache option for B+Tree (RDF Lexicon)

    Details

      Description

      Some indices are primarily used for point lookups, such as the RDF ID2TERM index. In fact, the ID2TERM index is modeled as a B+Tree primarily because that is the hammer that we are using for everything -- bigdata btrees are designed to scale-out dynamically and incrementally over variable amounts of hardware. It is harder to do that with Distributed Hash Tables (DHTs).

      Such indices can benefit from a hash map cache in front of the index to reduce key search time in the B+Tree. This should be an option in the IndexMetadata. When enabled, the B+Tree shard would maintain a local LRU/LIRS cache from the B+Tree key to the B+Tree value.

      Ideally, cache evictions would be managed based on an approximate total cache access order shared with the LRUNexus such that evictions can be driven by heap usage. This will require some more generalization of the IGlobalLRU and the LRUNexus since different types of caches would now compete for the same heap resources.

      Since indices such as ID2TERM are accessed from within JOINs (or will be once we have foreign key joins working) the cache really needs to be local to the B+Tree shard rather than in front of the bigdata federation.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        A lexicon specific cache has been added for the ID2TERM index. There is a configuration option for the capacity of this cache.

        Note: This may be replaced or enhanced by a general purpose tuple cache in the future which could be enabled for select indices. However, we also want to study whether it makes more sense to provide distributed hash table (DHT) support.

        Show
        bryanthompson bryanthompson added a comment - A lexicon specific cache has been added for the ID2TERM index. There is a configuration option for the capacity of this cache. Note: This may be replaced or enhanced by a general purpose tuple cache in the future which could be enabled for select indices. However, we also want to study whether it makes more sense to provide distributed hash table (DHT) support.

          People

          • Assignee:
            Unassigned
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: