Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-670

Concurrent unisolated operations against multiple KBs on the same Journal (Group Commit)

    Details

    • Type: New Feature
    • Status: Done
    • Priority: Highest
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Journal

      Description

      This ticket would provide a capability for concurrent unisolated write operations against distinct KBs on the same RWStore or MemStore Journal. In fact, this amounts to using pessimistic locking rather than optimistic concurrency control with KB or index level granularity. The advantage of pessimistic locking in this case is more scalable updates (when compared to transactional isolation based on the use of a B+Tree to buffer the write set) and the opportunity for concurrent writers against distinct triple/quad store instances.

      The current approach for transactional isolation uses a B+Tree to buffer writes for each index on which the transaction writes. This is an MVCC strategy. When the transaction prepares, the write set is validated against the ground state on which the transaction was reading. Write-write conflicts are detected through the use of revision timestamps. Certain kinds of write-write conflicts can be reconciled. If the write set has conflicts which can not be reconciled, then the transaction is aborted. Otherwise it commits. The commit protocol involves copying the write set onto the corresponding unisolated index. For historical reasons, the write set is buffered on a temporary store associated with the transaction (this prevented the WORM store from "leaking" storage associated with the write set).

      The proposed approach would use the allocation context mechanisms provided by the RWStore and the MemStore to keep allocations within the context of the transaction. It would NOT use a B+Tree to buffer the write set. Instead, the combination of a transaction local allocation context and the copy-on-write semantics of the B+Tree or HTree would provide isolation. In order to prevent concurrent writers from touching the same indices, a lock would be established. The lock could either be on the namespace of the triple store, the namespaces of the relations, or the name spaces of the indices to be accessed. We have code to support 2PL, but if the lock(s) are simply declared when the transaction is created and sorted, then the potential for deadlocks can be avoided. The commit protocol would flush the dirty indices to the backing store, graph the semaphore for the Journal which controls "global" write access, and then do a Journal level commit. Finally, the allocation context associated with the transaction would be released on either abort() or commit().

      Another use case for this pessimistic locking mechanism is SPARQL UPDATE of named solution sets. Right now, there is only one writer. If we used 2PL then we could allow concurrent SPARQL UPDATE operations which wrote on different named solution sets. These operations would not block unless they attempted to write on the same resource, i.e., on the same triple store, quad store, named index, or named solution set. That could significantly increase the parallelism for SPARQL UPDATE operations which are modifying the named solution sets without causing changes to the triple / quad store itself.

      One other twist is that we use unisolated views to write on the lexicon indices, but we use an eventually consistent design which avoids the need for locks. However, the proposed locking approach would be per-KB, so it would not be possible for there to be 2 writers on the same lexicon indices. Therefore, this design does not appear to be an issue for the proposed approach.

      Martyn and I discussed possible implementation strategies. Probably the best approach would be to provide a different ITx implementation based on pessimistic locking. This would be used for transactions when the BigdataSail did not enable isolatable indices. No change should be required at the application layer.

      Workaround:

      A workaround for a highly scalable architecture is to break up the workload into a "load" cluster (which can also do inference) and a "read" cluster (which handles queries). Durable queues can be used to present updates to the load cluster. The change log mechanism can be used to extract the delta (including inferences that are asserted or retracted) and then drop that delta onto a durable queue for the "read" cluster. An efficient low-level task can use the existing job-centric concurrency and group commit logic to bulk load the deltas into the "read" cluster. This approach has the significant advantage that the inference workload is removed from the cluster that is servicing the queries. If the inferences are performed using per-tenant journals, then the inference throughput can be scaled independent of the query workload using a pool of machines dedicated to computing the inferences for each tenant. The only detraction is how to handle low-latency (versus high throughput) updates when also using inference.

      Related tickets:

      See BLZG-461 (AbstractTask uses one TemporaryStoreFactory per read-only or read/write tx task)
      See BLZG-14 (HA doLocalAbort() should interrupt NSS requests and AbstractTasks)
      See BLZG-1036 (Name2Addr.indexNameScan(prefix) uses scan + filter)
      See BLZG-688 (GIST
      - Generalized Indices)
      See BLZG-1152 (SPARQL UPDATE warning with jetty client)
      See BLZG-1167 (Adapt blueprints integration to support group commit?)
      See BLZG-192 (Enable group commit by default (release 1.5.2))
      See BLZG-1171 (NPE in Leaf.hasDeleteMarkers in HA test suite with group commit)
      See BLZG-1172 (ClocksNotSynchronizedException not visible to HA client when GROUP COMMIT is enabled)
      See BLZG-1173 (DELETE-WITH-QUERY and UPDATE-WITH-QUERY (GROUP COMMIT))
      See BLZG-1174 (GlobalRowStoreHelper can hold hard reference to GSR index (GROUP COMMIT))
      See BLZG-193 (Code review on "instanceof Journal")

        Issue Links

          Activity

          Hide
          bryanthompson bryanthompson added a comment -

          TODOs


          - TestFederatedQuery
          - this test suite fails with group commit due to the manner in which the individual repositories are setup. Basically, the code is circumventing the group commit API. This was already fixed in the other NSS test suites, but it is more complicated for this one.


          - BigdataStatics.NSS_GROUP_COMMIT: This needs to be turned into a per-Journal configuration property. It can no longer be static.


          - Review leader fail and task interruption semantics when executing using group commit.


          - The HA test suite is not running correctly when group commit is enabled.


          - The blueprints test suite is not running correctly when group commit is enabled. See BLZG-1167. I am not sure if this is a bug or a feature.

          Show
          bryanthompson bryanthompson added a comment - TODOs - TestFederatedQuery - this test suite fails with group commit due to the manner in which the individual repositories are setup. Basically, the code is circumventing the group commit API. This was already fixed in the other NSS test suites, but it is more complicated for this one. - BigdataStatics.NSS_GROUP_COMMIT: This needs to be turned into a per-Journal configuration property. It can no longer be static. - Review leader fail and task interruption semantics when executing using group commit. - The HA test suite is not running correctly when group commit is enabled. - The blueprints test suite is not running correctly when group commit is enabled. See BLZG-1167 . I am not sure if this is a bug or a feature.
          Hide
          bryanthompson bryanthompson added a comment -

          I have replaced BigdataStatics.NSS_GROUP_COMMIT with Journal.Options.GROUP_COMMIT and defined IIndexManager.isGroupCommit().

          Group commit is now enabled for the NSS in CI.

          I still need to debug the HA test suite with group commit enabled.

          The blueprints http client test suite is also still broken.

          Commit to SF GIT NSS_GROUP_COMMIT branch c702b77c17d0149570f69d09bbdfe019704fdd22.

          Show
          bryanthompson bryanthompson added a comment - I have replaced BigdataStatics.NSS_GROUP_COMMIT with Journal.Options.GROUP_COMMIT and defined IIndexManager.isGroupCommit(). Group commit is now enabled for the NSS in CI. I still need to debug the HA test suite with group commit enabled. The blueprints http client test suite is also still broken. Commit to SF GIT NSS_GROUP_COMMIT branch c702b77c17d0149570f69d09bbdfe019704fdd22.
          Hide
          bryanthompson bryanthompson added a comment -

          Fixed the HA startup. The problem was the IsolatedActionJournal allowing the GlobalRowStoreHelper to register the GSR against the unisolated Name2Addr index. This has happening because the IsolatedActionJournal.getResourceLocator() was returning a locator that first searched the IsolatedActionJournal (and n2a) and then searched the unisolated Name2Addr on the underlying Journal.

          Modified TestConcurrentKBCreate to run both with and without group commit and hooked this into each of the SAIL modes (sids, triples, quads).

          Remaining issues:


          - BLZG-1173 (DELETE-WITH-QUERY and UPDATE-WITH-QUERY)
          - BLZG-1167 Blueprints http graph client test setup (modify the wrapper class to not expose the Sail).
          - BLZG-1171 (NPE when running HA with group commit enabled).
          - BLZG-1172 (ClocksNotSynchronizedException not visible to HA client when GROUP COMMIT is enabled)

          Commit 4e6c84c44f361178ab4d5204329a9690fe1dbe47

          Show
          bryanthompson bryanthompson added a comment - Fixed the HA startup. The problem was the IsolatedActionJournal allowing the GlobalRowStoreHelper to register the GSR against the unisolated Name2Addr index. This has happening because the IsolatedActionJournal.getResourceLocator() was returning a locator that first searched the IsolatedActionJournal (and n2a) and then searched the unisolated Name2Addr on the underlying Journal. Modified TestConcurrentKBCreate to run both with and without group commit and hooked this into each of the SAIL modes (sids, triples, quads). Remaining issues: - BLZG-1173 (DELETE-WITH-QUERY and UPDATE-WITH-QUERY) - BLZG-1167 Blueprints http graph client test setup (modify the wrapper class to not expose the Sail). - BLZG-1171 (NPE when running HA with group commit enabled). - BLZG-1172 (ClocksNotSynchronizedException not visible to HA client when GROUP COMMIT is enabled) Commit 4e6c84c44f361178ab4d5204329a9690fe1dbe47
          Hide
          bryanthompson bryanthompson added a comment -

          The blueprints http client test suite issue remains. This is a test suite problem, not a group commit problem. See BLZG-1167.

          I am running the HA test suite in CI now, but I have closed BLZG-1171 and expect it to be green.

          This ticket is closed. The group commit feature is staged for release in 1.5.1 as a beta feature.

          Show
          bryanthompson bryanthompson added a comment - The blueprints http client test suite issue remains. This is a test suite problem, not a group commit problem. See BLZG-1167 . I am running the HA test suite in CI now, but I have closed BLZG-1171 and expect it to be green. This ticket is closed. The group commit feature is staged for release in 1.5.1 as a beta feature.
          Hide
          bryanthompson bryanthompson added a comment -

          We just had a clean run through both HA CI and the NSS standalone CI with group commit enabled.

          Show
          bryanthompson bryanthompson added a comment - We just had a clean run through both HA CI and the NSS standalone CI with group commit enabled.

            People

            • Assignee:
              bryanthompson bryanthompson
              Reporter:
              bryanthompson bryanthompson
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: