Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-192 Enable group commit by default
  3. BLZG-1303

Possible leak of allocators in RWStore.reset() with GROUP_COMMIT

    XMLWordPrintable

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Highest
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_2_X_BACKLOG
    • Component/s: RWStore
    • Labels:
      None

      Description

      If group commit is enabled and there is a call to RWStore.reset() with no active allocation contexts (aka no active unisolated AbstractTasks) followed by an interrupt occurring during RWStore.reset() and the AbstractJournal.requiresAbort flag is set and then a new allocation context is created before a subsequent abort() leading to an RWStore.reset() call, the code in RWStore.reset() can fail to correctly manage the restore of the allocators which should have been released by the previous reset since isolatedWrites will now be true in RWStore.reset(). See the code below.

      RWStore.reset() should be safe on redo. There are a few different mechanisms available to achieve this. One is to ensure that the code can not observe an interrupt (pure compute code does not observe interrupts). Another is to catch an interrupt (or nested interrupt), redo the loop while the lock remains held, and then set the interrupt state of the thread before leaving the core method.

          public void reset() {
      
              if (log.isInfoEnabled()) {
                  log.info("RWStore Reset");
              }
              m_allocationWriteLock.lock();
              try {
                  assertOpen();
      //          assertNoRebuild();
      
                  final CommitState commitState = m_commitStateRef
                          .getAndSet(null/* newValue */);
      
                  if (commitState != null) {
                  
                      commitState.reset(); // restore state values on RWStore.
                      
                  }
                  
                  boolean isolatedWrites = false;
                  /**
                   * Clear all allocators, not just dirty allocators, since we also
                   * need to reset the transient bits associated with session
                   * protection.
                   * 
                   * Need to know if there are any isolated modifications, in which case
                   * we must remember so that we avoid clearing down the store.
                   */
                  for (FixedAllocator fa : m_allocs) {
                      isolatedWrites = isolatedWrites || fa.reset(m_writeCacheService, m_committedNextAllocation);
                  }
                  
                  /**
                   * Now clone the transient metabits for protection if this service becomes leader
                   */
                  syncMetaTransients();
                             
                  if (!isolatedWrites) {
                      /**
                       * Now we should be able to unwind any unused allocators and unused
                       * alloc blocks.  An unused allocator is one with no diskAddr (never
                       * committed).  But it may be more difficult to determine if
                       * an alloc block has never been used, for that we really need to
                       * know what the nextAllocationOffset was at the previous commit.
                       * This could be cached as lastCommittedOffset, in which case we can unwind any
                       * allocBlocks with addresses >= to that.
                       */
                      int origAllocs = m_allocs.size();
                      while (m_allocs.size() > 0) {
                          final int last = m_allocs.size()-1;
                          final FixedAllocator fa = m_allocs.get(last);
                          if (fa.getDiskAddr() == 0) {
                          	fa.setIndex(-1);
                              // must remove from free list!
                              m_freeFixed[fixedAllocatorIndex(fa.m_size)].remove(fa);
                              // ..and then from main allocation list
                              m_allocs.remove(last);
                          } else {
                              break;
                          }
                      }
                      m_nextAllocation = m_committedNextAllocation;
                      if (log.isDebugEnabled())
                          log.debug("Reset allocators, old: " + origAllocs + ", now: " + m_allocs.size());
                      
                      // Clear the dirty list.
                      // FIXME: we should be able to clear the dirty list, but this currently causes
                      //  problems in HA.
                      // If the allocators are torn down correctly, we should be good to clear the commitList
                       m_commitList.clear();
                      
                      // Flag no allocations since last commit
                      m_recentAlloc = false;
                  } else {
                  	// there are isolated writes, so we must not clear the commit list since otherwise
                  	//	the Alloction index wil get out of sync as per Ticket #1136
                  }
                  
                  if (m_quorum != null) {
                      /**
                       * When the RWStore is part of an HA quorum, we need to close
                       * out and then reopen the WriteCacheService every time the
                       * quorum token is changed. For convienence, this is handled by
                       * extending the semantics of abort() on the Journal and reset()
                       * on the RWStore.
                       * 
                       * @see <a
                       *      href="https://sourceforge.net/apps/trac/bigdata/ticket/530">
                       *      HA Journal </a>
                       */
                      m_writeCacheService.close();
                      m_writeCacheService = newWriteCacheService();
                  } else if (m_writeCacheService != null) {
                      /*
                       * Note: We DO NOT need to reset() the WriteCacheService. If a
                       * record was already flushed to the disk, then it is on the
                       * disk and clearing the record from the cache will not change
                       * that. If the record has not yet been flushed to the disk,
                       * then we already cleared it from the WCS when we reset the
                       * FixedAllocators (above).
                       */
      //                m_writeCacheService.reset();
      //                m_writeCacheService.setExtent(convertAddr(m_fileSize));
                  }
                  /*
                   * Discard any writes on the delete blocks. Those deletes MUST NOT
                   * be applied after a reset() on the RWStore.
                   * 
                   * @see https://sourceforge.net/apps/trac/bigdata/ticket/602
                   * (RWStore does not discard deferred deletes on reset)
                   */
                  m_deferredFreeOut.reset();
                  
                  /*
                   * Reset any storage stats
                   */
                  if (m_storageStatsAddr != 0) {
                      m_storageStats.reset();             
                  } else {
                      m_storageStats = new StorageStats(m_allocSizes);
                  }
      
              } catch (Exception e) {
                  throw new IllegalStateException("Unable to reset the store", e);
              } finally {
                  m_allocationWriteLock.unlock();
              }
          }
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              martyncutcher martyncutcher
              Reporter:
              bryanthompson bryanthompson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: