Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1392

Assess impact of the indexCache timeout.

    Details

    • Type: Task
    • Status: Done
    • Priority: Medium
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_RELEASE_1_5_2
    • Component/s: None
    • Labels:
      None

      Description

      The indexCache in AbstractJournal is used to retain access to recently used indices (ICheckpointProtocol is an index object) by (namespace,timestamp). Under heavy concurrent update pressure (as demonstrated by 16 or 32 client BSBM EXPLORE+UPDATE), the pace of updates causes this cache to grow quite large - as large as 1M entries after 120 runs of the cited benchmark.

          final private ConcurrentWeakValueCacheWithTimeout<NT, ICheckpointProtocol> indexCache;
      

      The cache is not infinitely leaking memory for the indices. In those 1M entries, all but 14 (two commit points) will be weakly reachable and swept by GC once the timeout for those entries has expired. That timeout currently defaults to 60 seconds.

      I believe that this timeout can be reduced to 5 seconds for such a workload, and that this low timeout might be a good default.

      The use case for retaining a larger timeout is a workload where there are many namespaces and only occasional access to those namespaces. In this context, if a namespace is accessed again within 60 seconds then it will remain in memory and stay "hot".

      The timeouts are imposed by the cleaner service in SynchronizedHardReferenceQueueWithTimeout. This service runs every 5 seconds. When it runs it scans the entries on the hard reference queue and clears the hard reference for any entry that has not been touched within the last 60 seconds.

          private static final ScheduledExecutorService cleanerService;
          static {
              
              cleanerService = Executors
                      .newSingleThreadScheduledExecutor(new DaemonThreadFactory(
                              "StaleReferenceCleaner"));
              
              cleanerService.scheduleWithFixedDelay(new Cleaner(),
                      5000/* initialDelay */, 5000/* delay */, TimeUnit.MILLISECONDS);
         
          }
      

      These behaviors can be configured using com.bigdata.journal.Options. The relevant parameter is given below. While there are (many) other related parameters, I suspect that we only need to tune this one.

          /**
           * The timeout in milliseconds for stale entries in the historical index
           * cache -or- ZERO (0) to disable the timeout (default
           * {@value #DEFAULT_HISTORICAL_INDEX_CACHE_TIMEOUT}). When this timeout
           * expires, the reference for the entry in the backing
           * {@link HardReferenceQueue} will be cleared. Note that the entry will
           * remain in the historical index cache regardless as long as it is strongly
           * reachable.
           * 
           * @see AbstractJournal#getIndexWithCheckpointAddr(long)
           */
          String HISTORICAL_INDEX_CACHE_TIMEOUT = AbstractJournal.class.getName()
                  + ".historicalIndexCacheTimeout";
      
          String DEFAULT_HISTORICAL_INDEX_CACHE_TIMEOUT = "" + (60 * 1000);
      

      This ticket is to examine the performance impact of this parameter using BSBM EXPLORE+UPDATE with 1, 16, 32, and 64 threads and recommend a setting that provides higher throughput overall.

        Activity

        Show
        michaelschmidt michaelschmidt added a comment - See the experiments described in https://docs.google.com/spreadsheets/d/1bkBtgSuR1BpIo4jEtvbdpzOw5rHXVl4DoUOkaxgisKg/edit#gid=853738299 and https://docs.google.com/spreadsheets/d/1bkBtgSuR1BpIo4jEtvbdpzOw5rHXVl4DoUOkaxgisKg/edit#gid=2120126172 -> there's no notable effect of changing timeout.
        Hide
        bryanthompson bryanthompson added a comment -

        Michael, if you have time to run those benchmarks before the release then I think we can get the new default setting into the release. As long as the timeout is greater than the delay between queries, the parameter should not have any performance impact on a read only system. We might want to keep the timeout as high as 10 seconds to ensure that it is greater than any possible GC pause.

        I think that it would be worth while to assess the performance impact on a BSBM 16 client EXPLORE in the following manner. First, warm up the system. Then while leaving the server up, wait until the timeout is known to have expired. Then do another benchmark run and see if the performance has dropped against the previous hot run due to entries in the indexCache having been cleared.

        It is NOT critical to get this into the current release. We can (and should) capture the outcome of this ticket in the performance tuning section of the wiki. We can also blog the recommendation and people can apply and test the change simply by setting the parameter documented above in their RWStore.properties file and then restarting the service (it is not a sticky parameter).

        Show
        bryanthompson bryanthompson added a comment - Michael, if you have time to run those benchmarks before the release then I think we can get the new default setting into the release. As long as the timeout is greater than the delay between queries, the parameter should not have any performance impact on a read only system. We might want to keep the timeout as high as 10 seconds to ensure that it is greater than any possible GC pause. I think that it would be worth while to assess the performance impact on a BSBM 16 client EXPLORE in the following manner. First, warm up the system. Then while leaving the server up, wait until the timeout is known to have expired. Then do another benchmark run and see if the performance has dropped against the previous hot run due to entries in the indexCache having been cleared. It is NOT critical to get this into the current release. We can (and should) capture the outcome of this ticket in the performance tuning section of the wiki. We can also blog the recommendation and people can apply and test the change simply by setting the parameter documented above in their RWStore.properties file and then restarting the service (it is not a sticky parameter).

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: