Details

      Description

      The cluster is issuing too many GRS reads when attempting to obtain and maintain access to locatable resources in support of SPARQL query.

      The cluster currently does excessive reads through to the GRS (Global Row Store) because the propertyCache in the DefaultLocator is defeated by the absence of the commitTime metadata for a transaction. However, the use of the readLock in the NanoSparqlServer should result in a stable txid for all query requests against that NSS instance. Therefore, we should be able to cache based on the txid and avoid most GRS reads on the cluster.

      Also see [1] and [2]

      [1] http://sourceforge.net/apps/trac/bigdata/ticket/454 (Global Row Store Read on Cluster uses Tx)
      [2] https://sourceforge.net/apps/trac/bigdata/ticket/266 (Refactor native long tx id to thin object)

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Modifications to facilitate propertySet caching on the cluster.

        Added IIndexStore#getGlobalRowStore(timestamp) and provided suitable declarations.

        Modified DefaultResourceLocator to cache based on the caller's timestamp when running on a cluster (for a read-only view request). In combination with the NanoSparqlServer readLock, this should drammatically reduce GRS read threw requests.

        Boosted the capacity and timeout for the DefaultResourceLocator#propertyCache. This cache can not cause indices and relations to be retained. However, a hit on this cache can avoid a GRS read and provide higher concurrency and throughput. This would appear to make the larger propertyCache configuration a win.

        Committed Revision: r5920

        Show
        bryanthompson bryanthompson added a comment - Modifications to facilitate propertySet caching on the cluster. Added IIndexStore#getGlobalRowStore(timestamp) and provided suitable declarations. Modified DefaultResourceLocator to cache based on the caller's timestamp when running on a cluster (for a read-only view request). In combination with the NanoSparqlServer readLock, this should drammatically reduce GRS read threw requests. Boosted the capacity and timeout for the DefaultResourceLocator#propertyCache. This cache can not cause indices and relations to be retained. However, a hit on this cache can avoid a GRS read and provide higher concurrency and throughput. This would appear to make the larger propertyCache configuration a win. Committed Revision: r5920
        Hide
        bryanthompson bryanthompson added a comment -

        I believe that another part of the problem is that the SynchronizedHardReferenceQueueWithTimeout class is defeating the "scan" of the head of the inner queue because it is attempting to compare a reference with a ValueAge object wrapping a reference. The application level reference is a locatable resource. The ValueAge object is a wrapper used to determine the age of an object in the inner hard reference queue and decide when the object reference should be evicted. Since the two different kinds of reference can never test == the scan is failing to consolidate touches on the inner queue for what is in fact the same wrapped reference object.

        I am going to write up some unit tests for this condition and then look at options for fixing this problem. Since we want the add() to actually update the timestamp on the ValueAge object, some sort of protocol may need to be hammered out which can be shared by the RingBuffer class (which actually implements scanHead), the SynchronizedHardReferenceQueueWithTimeout, and the ValueAge object.

        Show
        bryanthompson bryanthompson added a comment - I believe that another part of the problem is that the SynchronizedHardReferenceQueueWithTimeout class is defeating the "scan" of the head of the inner queue because it is attempting to compare a reference with a ValueAge object wrapping a reference. The application level reference is a locatable resource. The ValueAge object is a wrapper used to determine the age of an object in the inner hard reference queue and decide when the object reference should be evicted. Since the two different kinds of reference can never test == the scan is failing to consolidate touches on the inner queue for what is in fact the same wrapped reference object. I am going to write up some unit tests for this condition and then look at options for fixing this problem. Since we want the add() to actually update the timestamp on the ValueAge object, some sort of protocol may need to be hammered out which can be shared by the RingBuffer class (which actually implements scanHead), the SynchronizedHardReferenceQueueWithTimeout, and the ValueAge object.
        Hide
        bryanthompson bryanthompson added a comment -

        I have made some progress on a test suite for SynchronizedHardReferenceQueueWithTimeout based on the test suite for HardReferenceQueue. Some adjustments needed to be made in the test suite and the class under test since the references on the Queue are ValueAge objects which wrap the application objects. At this point I have the test of the eviction logic working.

        The next step is to port the test of scanHead(). Once ported, that test will fail until the SynchronizedHardReferenceQueueWithTimeout has been modified to properly scan the inner references for a match on a touch.

        Show
        bryanthompson bryanthompson added a comment - I have made some progress on a test suite for SynchronizedHardReferenceQueueWithTimeout based on the test suite for HardReferenceQueue. Some adjustments needed to be made in the test suite and the class under test since the references on the Queue are ValueAge objects which wrap the application objects. At this point I have the test of the eviction logic working. The next step is to port the test of scanHead(). Once ported, that test will fail until the SynchronizedHardReferenceQueueWithTimeout has been modified to properly scan the inner references for a match on a touch.
        Hide
        bryanthompson bryanthompson added a comment -

        I migrated the unit test for scanHead() into TestSynchronizedHardReferenceQueueWithTimeout.

        I modified the inner hard reference queue implementation of the SynchronizedHardReferenceQueueWithTimeout class to override scanHead(). The overridden version of that method now does the necessary indirection to compare the wrapped references. I had to remove the final attribute from scanHead() to do this, but that should be of no consequence.

        The com.bigdata.cache test suite runs green.

        Committed revision r5933.

        Show
        bryanthompson bryanthompson added a comment - I migrated the unit test for scanHead() into TestSynchronizedHardReferenceQueueWithTimeout. I modified the inner hard reference queue implementation of the SynchronizedHardReferenceQueueWithTimeout class to override scanHead(). The overridden version of that method now does the necessary indirection to compare the wrapped references. I had to remove the final attribute from scanHead() to do this, but that should be of no consequence. The com.bigdata.cache test suite runs green. Committed revision r5933.
        Hide
        bryanthompson bryanthompson added a comment -

        Modified RingBuffer to expose a _get(index) method which returns the element at the specified offset in the backing array without correcting for the tail offset. This provides a fix for 2 CI test regressions related to the recent changes in SynchronizedHardReferenceQueueWithTimeout.

        Committed Revision r5940.

        Show
        bryanthompson bryanthompson added a comment - Modified RingBuffer to expose a _get(index) method which returns the element at the specified offset in the backing array without correcting for the tail offset. This provides a fix for 2 CI test regressions related to the recent changes in SynchronizedHardReferenceQueueWithTimeout. Committed Revision r5940.
        Hide
        bryanthompson bryanthompson added a comment -

        This issue is resolved by the change set above.

        Show
        bryanthompson bryanthompson added a comment - This issue is resolved by the change set above.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: