Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-563

Thread-local cache combined with unbounded thread pools causes effective memory leak

    Details

      Description

      Quite some time ago we introduced classes for batched updates against a hard reference queue (ring buffer data structure) and concurrent hash map with weak values. The relevant classes are:

      1. HardReferenceQueueWithBatchingUpdates
      2. ConcurrentWeakValueCacheWithBatchedUpdates

      At the time, we considered two designs. One using striped locks and one using thread locals. The thread local design had better throughput and has been in place for quite some time. However, the thread local design is turning into a memory leak. The purpose of this issue is to correct that memory leak.

      The ConcurrentWeakValueCacheWithBatchedUpdates is only used by the LexiconRelation's termCache. This is the source of the memory leak. The HardReferenceQueueWithBatchingUpdates is used to back up the termCache. The HardReferenceQueueWithBatchingUpdates is also used for the writeRetentionQueue on the B+Tree and HTree, but I have not seen evidence of a memory leak in those cases.

      The problem can be corrected in either of two ways:

      1. Implement a striped lock version of the IHardReferenceQueue interface and use it in place of the thread-local version for the termCache.

      2. Use a fixed size thread pool for all operations which touch the termCache.

      Either approach will eliminate the memory leak.

      This issue is related to [1], and in fact would appear to be at least one of the root causes of [1] (there may be other causes, but this one has been observed).

      This problem exists against the 1.1.x release and is also doubtless present in the 1.0.x release, though I have not verified that yet. The problem only appears when there is a heavy sustained concurrent query workload and can be masked on machines with large JVM heaps.

      [1] https://sourceforge.net/apps/trac/bigdata/ticket/433 (Cluster leaks threads under read-only index operations)

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        The changes described above have resolved the memory leak in the termCache.

        A memory leak / thread leak is still observed on the cluster. The leak appears to be tied to DGC [1]. It can be observed by running the BSBM benchmark against the cluster. This leak appears to have a separate cause.

        I am also observing too much memory utilization for index shards. The likely cure for that is to reduce the writeRetentionQueue for the indices in the cluster. This makes sense because we wind up with 100s or 1000s of shards per node in the cluster, and each shard holds much less data than on the single machine deployments. Also, on the cluster the index segment leaf is at most one IO if the index segment is open and only 2-3 IOs if the index segment is not open. Therefore the default writeRetentionQueue capacity of 500 (or less) is probably appropriate to the cluster. It would help to look at the actual #of nodes and leaves in a full shard to make this determination.

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/433

        Show
        bryanthompson bryanthompson added a comment - The changes described above have resolved the memory leak in the termCache. A memory leak / thread leak is still observed on the cluster. The leak appears to be tied to DGC [1] . It can be observed by running the BSBM benchmark against the cluster. This leak appears to have a separate cause. I am also observing too much memory utilization for index shards. The likely cure for that is to reduce the writeRetentionQueue for the indices in the cluster. This makes sense because we wind up with 100s or 1000s of shards per node in the cluster, and each shard holds much less data than on the single machine deployments. Also, on the cluster the index segment leaf is at most one IO if the index segment is open and only 2-3 IOs if the index segment is not open. Therefore the default writeRetentionQueue capacity of 500 (or less) is probably appropriate to the cluster. It would help to look at the actual #of nodes and leaves in a full shard to make this determination. [1] https://sourceforge.net/apps/trac/bigdata/ticket/433
        Hide
        bryanthompson bryanthompson added a comment -

        Next step is to declare a sync RPC API for fast range counts. That is the likely source of most of the Futures. The RangeCountProcedure is being submitted by the ClientIndexView. If we define a purpose specific method for this on IDataService then we should be able to reduce the exported objects considerably. Make sure to use an interface for the parameter as part of a movement towards a continuously evolvable deployment.

        Show
        bryanthompson bryanthompson added a comment - Next step is to declare a sync RPC API for fast range counts. That is the likely source of most of the Futures. The RangeCountProcedure is being submitted by the ClientIndexView. If we define a purpose specific method for this on IDataService then we should be able to reduce the exported objects considerably. Make sure to use an interface for the parameter as part of a movement towards a continuously evolvable deployment.
        Hide
        bryanthompson bryanthompson added a comment -

        I have implemented a workaround which addresses the problem. I added a ThickFuture class. The constructor waits until the Future is done and then records the outcome, plus whether there was an exception/interrupt (cancelled). Using this for just the fast range counts is sufficient for the single threaded BSBM harness to run. Doing this for all requests is sufficient to run 8 concurrent query clients with no increase in the thread count on the DS. I am now running a sustained BSBM query mix against the cluster.

        Committed Revision r5831

        Show
        bryanthompson bryanthompson added a comment - I have implemented a workaround which addresses the problem. I added a ThickFuture class. The constructor waits until the Future is done and then records the outcome, plus whether there was an exception/interrupt (cancelled). Using this for just the fast range counts is sufficient for the single threaded BSBM harness to run. Doing this for all requests is sufficient to run 8 concurrent query clients with no increase in the thread count on the DS. I am now running a sustained BSBM query mix against the cluster. Committed Revision r5831
        Hide
        bryanthompson bryanthompson added a comment -

        Ported r5824 and r5828 to the 1.0.x maintenance branch. These change
        sets include the support for striped locks in the batched hard
        reference queue backing the concurrent hash map used by the termCache.

        Note: The TermId.cache mechanism also needs to be defeated. This was
        done in r5829 in the BIGDATA_RELEASE_1_1_0 maintenance branch.
        However, the IV class hierarchy is significantly different for 1.1.x
        so this change set needs to be reimplemented for 1.0.x rather than
        simply imported. An ITermCache, TermCache, and IV.clone(boolean) were
        all introduced to defeat this memory leak.

        Committed revision r5836 (1.0.x maintenance branch)

        Show
        bryanthompson bryanthompson added a comment - Ported r5824 and r5828 to the 1.0.x maintenance branch. These change sets include the support for striped locks in the batched hard reference queue backing the concurrent hash map used by the termCache. Note: The TermId.cache mechanism also needs to be defeated. This was done in r5829 in the BIGDATA_RELEASE_1_1_0 maintenance branch. However, the IV class hierarchy is significantly different for 1.1.x so this change set needs to be reimplemented for 1.0.x rather than simply imported. An ITermCache, TermCache, and IV.clone(boolean) were all introduced to defeat this memory leak. Committed revision r5836 (1.0.x maintenance branch)
        Hide
        bryanthompson bryanthompson added a comment -

        Port of r5829 to the 1.0.x maintenance branch.

        Encapsulate terms cache with interface and ensure IV.cache reference
        is cleared when adding to the map.


        - ITermCache
        - TermCache
        - LexiconRelation#termCache, #termCacheFactory

        All IVs must implement clone(boolean).


        - TestIVCache
        - IV.dropCache(), IV.clone(boolean)
        - AbstractIV#setValue() => final
        - AbstractIV#dropCache() => removed.
        - AbstractIV#flags => protected.
        - TermId#clone(boolean).
        - TermId#setValue(...) => removed (overrode AbstractIV but added nothing)
        - NullIV#clone(boolean) returns self.
        - DummyIV#clone(boolean) returns self.
        - TestEncodeDecodeKeys: implemented clone(boolean) in private IV impl.
        - TestEncodeDecodeKeys: imported logic to test clone(boolean).
        - ExtensionIV#clone(boolean)
        - SidIV#clone(boolean) & removed unused logger.
        - NumericBNodeIV#clone(boolean)
        - UUIDBNodeIV#clone(boolean)
        - UUIDLiteralIV#clone(boolean)
        - XSDBooleanIV#clone(boolean)
        - XSDByteIV#clone(boolean)
        - WrappedIV#clone(boolean): <<<=== Review with MikeP.

        Note: No changes were required in order to encapsulate access to the
        termsCache using ITermCache. While it is used by various index
        procedures in the 1.0.x maintenance branch, all of those references
        are from inner classes which are directly accessing the termsCache
        reference on the LexiconRelation. (This was changed in the 1.1.x
        branch, which now has static helper classes for those tasks.)

        Committed revision r5837 (1.0.x maintenance branch)

        This issue is now resolved against both the 1.0.x and 1.1.x maintenance branches.

        Show
        bryanthompson bryanthompson added a comment - Port of r5829 to the 1.0.x maintenance branch. Encapsulate terms cache with interface and ensure IV.cache reference is cleared when adding to the map. - ITermCache - TermCache - LexiconRelation#termCache, #termCacheFactory All IVs must implement clone(boolean). - TestIVCache - IV.dropCache(), IV.clone(boolean) - AbstractIV#setValue() => final - AbstractIV#dropCache() => removed. - AbstractIV#flags => protected. - TermId#clone(boolean). - TermId#setValue(...) => removed (overrode AbstractIV but added nothing) - NullIV#clone(boolean) returns self. - DummyIV#clone(boolean) returns self. - TestEncodeDecodeKeys: implemented clone(boolean) in private IV impl. - TestEncodeDecodeKeys: imported logic to test clone(boolean). - ExtensionIV#clone(boolean) - SidIV#clone(boolean) & removed unused logger. - NumericBNodeIV#clone(boolean) - UUIDBNodeIV#clone(boolean) - UUIDLiteralIV#clone(boolean) - XSDBooleanIV#clone(boolean) - XSDByteIV#clone(boolean) - WrappedIV#clone(boolean): <<<=== Review with MikeP. Note: No changes were required in order to encapsulate access to the termsCache using ITermCache. While it is used by various index procedures in the 1.0.x maintenance branch, all of those references are from inner classes which are directly accessing the termsCache reference on the LexiconRelation. (This was changed in the 1.1.x branch, which now has static helper classes for those tasks.) Committed revision r5837 (1.0.x maintenance branch) This issue is now resolved against both the 1.0.x and 1.1.x maintenance branches.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: