Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1178

Bad Address: length requested greater than allocated slot (RWStore, GROUP COMMIT, HA-only)

    Details

      Description

      The UPDATE request was posted to the leader (bigdata15) at the LBS path:

      ./testdriver -idir /root/workspace/bsbmtools/trunk/td_100m/td_data -ucf usecases/exploreAndUpdate/sparql.txt -udataset td_100m/dataset_update.nt -seed $RANDOM -u http://localhost:8090/bigdata/LBS/leader/namespace/kb/sparql http://localhost:8090/bigdata/LBS/read/namespace/kb/sparql
      

      This is the error message on the leader:

      ERROR: 66492152 2015-03-03 09:23:50,392      qtp1193373768-558 com.bigdata.rdf.sail.webapp.BigdataRDFServlet.launderThrowable(BigdataRDFServlet.java:191): cause=java.util.concurrent.ExecutionException: java.lang.RuntimeEx\
      ception: Commit failed: Task{com.bigdata.rdf.task.ApiTaskForJournal,timestamp=unisolated,resource=[kb]}::{delegate=com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask{namespace=kb,timestamp=0, updateStr=[DELETE WHER\
      E
      { <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromVendor2757/Offer5438262> ?p ?o }
      ]}}, query=SPARQL-UPDATE: updateStr=DELETE WHERE
      { <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromVendor2757/Offer5438262> ?p ?o }
      
      java.util.concurrent.ExecutionException: java.lang.RuntimeException: Commit failed: Task{com.bigdata.rdf.task.ApiTaskForJournal,timestamp=unisolated,resource=[kb]}::{delegate=com.bigdata.rdf.sail.webapp.QueryServlet$Sparq\
      lUpdateTask{namespace=kb,timestamp=0, updateStr=[DELETE WHERE
      { <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromVendor2757/Offer5438262> ?p ?o }
      ]}}
              at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
              at java.util.concurrent.FutureTask.get(FutureTask.java:111)
              at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:260)
              at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlUpdate(QueryServlet.java:359)
              at com.bigdata.rdf.sail.webapp.QueryServlet.doPost(QueryServlet.java:165)
              at com.bigdata.rdf.sail.webapp.RESTServlet.doPost(RESTServlet.java:237)
              at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doPost(MultiTenancyServlet.java:136)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
              at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
              at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
              at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:595)
              at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
              at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
              at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
              at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
              at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
              at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:191)
              at org.eclipse.jetty.server.Dispatcher.forward(Dispatcher.java:72)
              at com.bigdata.rdf.sail.webapp.HALoadBalancerServlet.forwardToLocalService(HALoadBalancerServlet.java:938)
              at com.bigdata.rdf.sail.webapp.lbs.AbstractLBSPolicy.service(AbstractLBSPolicy.java:245)
              at com.bigdata.rdf.sail.webapp.HALoadBalancerServlet.service(HALoadBalancerServlet.java:832)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
              at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
              at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
              at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
              at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
              at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
              at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
              at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
              at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
              at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
              at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
              at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
              at org.eclipse.jetty.server.Server.handle(Server.java:497)
              at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
              at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
              at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
              at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
              at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
              at java.lang.Thread.run(Thread.java:724)
      Caused by: java.lang.RuntimeException: Commit failed: Task{com.bigdata.rdf.task.ApiTaskForJournal,timestamp=unisolated,resource=[kb]}::{delegate=com.bigdata.rdf.sail.webapp.QueryServlet$SparqlUpdateTask{namespace=kb,times\
      tamp=0, updateStr=[DELETE WHERE
      { <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/dataFromVendor2757/Offer5438262> ?p ?o }
      ]}}
              at com.bigdata.journal.WriteExecutorService.afterTask(WriteExecutorService.java:968)
              at com.bigdata.journal.AbstractTask.doUnisolatedReadWriteTask(AbstractTask.java:2139)
              at com.bigdata.journal.AbstractTask.call2(AbstractTask.java:2030)
              at com.bigdata.journal.AbstractTask.call(AbstractTask.java:1896)
      	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
              at java.util.concurrent.FutureTask.run(FutureTask.java:166)
              at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63)
              at com.bigdata.concurrent.NonBlockingLockManagerWithNewDesign$LockFutureTask.run(NonBlockingLockManagerWithNewDesign.java:1984)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              ... 1 more
      Caused by: com.bigdata.journal.CommitException: Commit failed - will abort:  : java.lang.RuntimeException: Problem with entry at -314689041142382170: lastRootBlock=rootBlock{ rootBlock=1, challisField=219898, version=3, n\
      extOffset=3565067742101510, localTime=1425392630207 [Tuesday, March 3, 2015 9:23:50 AM EST], firstCommitTime=1422285932434 [Monday, January 26, 2015 10:25:32 AM EST], lastCommitTime=1425392630155 [Tuesday, March 3, 2015 9\
      :23:50 AM EST], commitCounter=219898, commitRecordAddr={off=NATIVE:-73269254,len=422}, commitRecordIndexAddr={off=NATIVE:-73244677,len=220}, blockSequence=1, quorumToken=124, metaBitsAddr=3564277797486954, metaStartAddr=8\
      74088, storeType=RW, uuid=f9f6a98e-cc98-4bae-affa-21f6ad2b4a70, offsetBits=42, checksum=1545545063, createTime=1422284759355 [Monday, January 26, 2015 10:05:59 AM EST], closeTime=0}
              at com.bigdata.journal.WriteExecutorService.commit(WriteExecutorService.java:2498)
              at com.bigdata.journal.WriteExecutorService.groupCommit(WriteExecutorService.java:1483)
              at com.bigdata.journal.WriteExecutorService.afterTask(WriteExecutorService.java:947)
              ... 10 more
      Caused by: java.lang.RuntimeException: Problem with entry at -314689041142382170: lastRootBlock=rootBlock{ rootBlock=1, challisField=219898, version=3, nextOffset=3565067742101510, localTime=1425392630207 [Tuesday, March \
      3, 2015 9:23:50 AM EST], firstCommitTime=1422285932434 [Monday, January 26, 2015 10:25:32 AM EST], lastCommitTime=1425392630155 [Tuesday, March 3, 2015 9:23:50 AM EST], commitCounter=219898, commitRecordAddr={off=NATIVE:-\
      73269254,len=422}, commitRecordIndexAddr={off=NATIVE:-73244677,len=220}, blockSequence=1, quorumToken=124, metaBitsAddr=3564277797486954, metaStartAddr=874088, storeType=RW, uuid=f9f6a98e-cc98-4bae-affa-21f6ad2b4a70, offs\
      etBits=42, checksum=1545545063, createTime=1422284759355 [Monday, January 26, 2015 10:05:59 AM EST], closeTime=0}
              at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3113)
              at com.bigdata.journal.WriteExecutorService.commit(WriteExecutorService.java:2417)
              ... 12 more
      Caused by: java.lang.RuntimeException: Problem with entry at -314689041142382170
              at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:4967)
              at com.bigdata.rwstore.RWStore.checkDeferredFrees(RWStore.java:3539)
              at com.bigdata.journal.RWStrategy.checkDeferredFrees(RWStrategy.java:781)
              at com.bigdata.journal.AbstractJournal$CommitState.writeCommitRecord(AbstractJournal.java:3476)
              at com.bigdata.journal.AbstractJournal$CommitState.access$2800(AbstractJournal.java:3278)
              at com.bigdata.journal.AbstractJournal.commitNow(AbstractJournal.java:4088)
              at com.bigdata.journal.AbstractJournal.commit(AbstractJournal.java:3111)
              ... 13 more
      Caused by: java.lang.RuntimeException: addr=-73261059 : cause=java.lang.IllegalStateException: Bad Address: length requested greater than allocated slot
              at com.bigdata.rwstore.RWStore.getData(RWStore.java:2190)
              at com.bigdata.rwstore.RWStore.getData(RWStore.java:1989)
              at com.bigdata.rwstore.RWStore.getData(RWStore.java:2033)
              at com.bigdata.rwstore.RWStore.getData(RWStore.java:1989)
              at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:4857)
              at com.bigdata.rwstore.RWStore.freeDeferrals(RWStore.java:4953)
              ... 19 more
      Caused by: java.lang.IllegalStateException: Bad Address: length requested greater than allocated slot
              at com.bigdata.rwstore.RWStore.getData(RWStore.java:2082)
              ... 24 more
      

      Note: We have not been able to replicate this problem on either the standalone or HA1 deployment modes. It appears to be linked to the postCommit() and postHACommit() methods that are only invoked in HA with a replication factor of greater than ONE.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        I have started BSBM UPDATE against the 2nd namespace as well. This has raised the average commit group size above 1. This is a clear indication that write sets of the updates on the different namespaces are being melded together into a single commit point.

        Show
        bryanthompson bryanthompson added a comment - I have started BSBM UPDATE against the 2nd namespace as well. This has raised the average commit group size above 1. This is a clear indication that write sets of the updates on the different namespaces are being melded together into a single commit point.
        Hide
        bryanthompson bryanthompson added a comment -

        Here is the DumpJournal output from the HA3 cluster with group commit enabled after the load of the 2nd BSBM 100M data set and with concurrent updates running.

        magic=e6b4c275
        version=1
        extent=14505607168(13833M), userExtent=14505606480(13833M), bytesAvailable=-898462614725346(-856840719M), nextOffset=898477120331826
        rootBlock{ rootBlock=0, challisField=699214, version=3, nextOffset=2116366670119808, localTime=1426246101678 [Friday, March 13, 2015 7:28:21 AM EDT], firstCommitTime=1426002333473 [Tuesday, March 10, 2015 11:45:33 AM EDT], lastCommitTime=1426246101663 [Friday, March 13, 2015 7:28:21 AM EDT], commitCounter=699214, commitRecordAddr={off=NATIVE:-60599748,len=422}, commitRecordIndexAddr={off=NATIVE:-60137500,len=220}, blockSequence=1, quorumToken=133, metaBitsAddr=2028349442490811, metaStartAddr=538192, storeType=RW, uuid=cfc29cfb-cb0a-436c-a1aa-e78723e893bb, offsetBits=42, checksum=-1958534204, createTime=1426002329172 [Tuesday, March 10, 2015 11:45:29 AM EDT], closeTime=0}
        rootBlock{ rootBlock=1, challisField=699215, version=3, nextOffset=2116366669144187, localTime=1426246101784 [Friday, March 13, 2015 7:28:21 AM EDT], firstCommitTime=1426002333473 [Tuesday, March 10, 2015 11:45:33 AM EDT], lastCommitTime=1426246101739 [Friday, March 13, 2015 7:28:21 AM EDT], commitCounter=699215, commitRecordAddr={off=NATIVE:-60599751,len=422}, commitRecordIndexAddr={off=NATIVE:-60448775,len=220}, blockSequence=1, quorumToken=133, metaBitsAddr=1878609031594427, metaStartAddr=538192, storeType=RW, uuid=cfc29cfb-cb0a-436c-a1aa-e78723e893bb, offsetBits=42, checksum=-1826675725, createTime=1426002329172 [Tuesday, March 10, 2015 11:45:29 AM EDT], closeTime=0}
        The current root block is BLZG-208
        
        -------------------------
        RWStore Allocator Summary
        -------------------------
        AllocatorSize      AllocatorCount   SlotsAllocated  %SlotsAllocated    SlotsRecycled        SlotChurn       SlotsInUse      %SlotsInUse   MeanAllocation    SlotsReserved     %SlotsUnused    BytesReserved     BytesAppData       %SlotWaste         %AppData       %StoreFile      %TotalWaste       %FileWaste 
        64                            334         14470080            12.29         12331446             6.77          2138635             4.14               20          2393088            10.63        153157632        260024548           -69.78             1.15             0.47            -1.10             -0.33 
        128                          4696         34467784            29.28           850844             1.03         33616940            65.04               96         33647616             0.09       4306894848       3256003421            24.40            14.41            13.35            10.86              3.26 
        192                           377          2880999             2.45           207283             1.08          2673716             5.17              152          2692096             0.68        516882432        407224884            21.22             1.80             1.60             1.13              0.34 
        320                            41          2918658             2.48          2705390            13.69           213268             0.41              227           283648            24.81         90767360         47998743            47.12             0.21             0.28             0.44              0.13 
        512                           280          3621434             3.08          1688774             1.87          1932660             3.74              417          1964544             1.62       1005846528        797340425            20.73             3.53             3.12             2.15              0.65 
        768                           297          5237278             4.45          3181434             2.55          2055844             3.98              647          2117888             2.93       1626537984       1328280275            18.34             5.88             5.04             3.08              0.92 
        1024                          458          6034599             5.13          2840269             1.89          3194330             6.18              919          3276288             2.50       3354918912       3012436481            10.21            13.33            10.40             3.54              1.06 
        2048                          676         15128546            12.85         10387246             3.19          4741300             9.17             1451          4837888             2.00       9907994624       7046950572            28.88            31.19            30.70            29.57              8.87 
        3072                           60          3569263             3.03          3199974             9.67           369289             0.71             2517           425216            13.15       1306263552        999417731            23.49             4.42             4.05             3.17              0.95 
        4096                           17          2263401             1.92          2190202            30.92            73199             0.14             3553           120064            39.03        491782144        162826245            66.89             0.72             1.52             3.40              1.02 
        8192                          164         27123515            23.04         26443023            39.86           680492             1.32             7272          1160704            41.37       9508487168       5274228224            44.53            23.35            29.47            43.76             13.12 
        
        -------------------------
        BLOBS
        -------------------------
        Bucket(K)   Allocations    Allocated      Deletes      Deleted      Current         Data         Mean        Churn
        16             10450907 117664886373     10323488 116174985131       127419   1489901242        11258        82.02
        32              1553589  28843154327      1533669  28437970383        19920    405183944        18565        77.99
        64                    4       131819            4       159982            0       -28163        32954         4.00
        128                   0            0            1        67780           -1       -67780            0         0.00
        256                   0            0            0            0            0            0            0         0.00
        512                   0            0            0            0            0            0            0         0.00
        1024                  0            0            0            0            0            0            0         0.00
        2048                  0            0            0            0            0            0            0         0.00
        4096                  0            0            0            0            0            0            0         0.00
        8192                  0            0            0            0            0            0            0         0.00
        16384                 0            0            0            0            0            0            0         0.00
        32768                 0            0            0            0            0            0            0         0.00
        65536                 0            0            0            0            0            0            0         0.00
        2097151               0            0            0            0            0            0            0         0.00
        
        Checking regions.....okay
        

        Note that attempting to dump the index pages causes a known concurrent modification issue where the dump journal logic is not properly isolated. See BLZG-847 (DumpJournal does not protect against concurrent updates (NSS)).

        Show
        bryanthompson bryanthompson added a comment - Here is the DumpJournal output from the HA3 cluster with group commit enabled after the load of the 2nd BSBM 100M data set and with concurrent updates running. magic=e6b4c275 version=1 extent=14505607168(13833M), userExtent=14505606480(13833M), bytesAvailable=-898462614725346(-856840719M), nextOffset=898477120331826 rootBlock{ rootBlock=0, challisField=699214, version=3, nextOffset=2116366670119808, localTime=1426246101678 [Friday, March 13, 2015 7:28:21 AM EDT], firstCommitTime=1426002333473 [Tuesday, March 10, 2015 11:45:33 AM EDT], lastCommitTime=1426246101663 [Friday, March 13, 2015 7:28:21 AM EDT], commitCounter=699214, commitRecordAddr={off=NATIVE:-60599748,len=422}, commitRecordIndexAddr={off=NATIVE:-60137500,len=220}, blockSequence=1, quorumToken=133, metaBitsAddr=2028349442490811, metaStartAddr=538192, storeType=RW, uuid=cfc29cfb-cb0a-436c-a1aa-e78723e893bb, offsetBits=42, checksum=-1958534204, createTime=1426002329172 [Tuesday, March 10, 2015 11:45:29 AM EDT], closeTime=0} rootBlock{ rootBlock=1, challisField=699215, version=3, nextOffset=2116366669144187, localTime=1426246101784 [Friday, March 13, 2015 7:28:21 AM EDT], firstCommitTime=1426002333473 [Tuesday, March 10, 2015 11:45:33 AM EDT], lastCommitTime=1426246101739 [Friday, March 13, 2015 7:28:21 AM EDT], commitCounter=699215, commitRecordAddr={off=NATIVE:-60599751,len=422}, commitRecordIndexAddr={off=NATIVE:-60448775,len=220}, blockSequence=1, quorumToken=133, metaBitsAddr=1878609031594427, metaStartAddr=538192, storeType=RW, uuid=cfc29cfb-cb0a-436c-a1aa-e78723e893bb, offsetBits=42, checksum=-1826675725, createTime=1426002329172 [Tuesday, March 10, 2015 11:45:29 AM EDT], closeTime=0} The current root block is BLZG-208 ------------------------- RWStore Allocator Summary ------------------------- AllocatorSize AllocatorCount SlotsAllocated %SlotsAllocated SlotsRecycled SlotChurn SlotsInUse %SlotsInUse MeanAllocation SlotsReserved %SlotsUnused BytesReserved BytesAppData %SlotWaste %AppData %StoreFile %TotalWaste %FileWaste 64 334 14470080 12.29 12331446 6.77 2138635 4.14 20 2393088 10.63 153157632 260024548 -69.78 1.15 0.47 -1.10 -0.33 128 4696 34467784 29.28 850844 1.03 33616940 65.04 96 33647616 0.09 4306894848 3256003421 24.40 14.41 13.35 10.86 3.26 192 377 2880999 2.45 207283 1.08 2673716 5.17 152 2692096 0.68 516882432 407224884 21.22 1.80 1.60 1.13 0.34 320 41 2918658 2.48 2705390 13.69 213268 0.41 227 283648 24.81 90767360 47998743 47.12 0.21 0.28 0.44 0.13 512 280 3621434 3.08 1688774 1.87 1932660 3.74 417 1964544 1.62 1005846528 797340425 20.73 3.53 3.12 2.15 0.65 768 297 5237278 4.45 3181434 2.55 2055844 3.98 647 2117888 2.93 1626537984 1328280275 18.34 5.88 5.04 3.08 0.92 1024 458 6034599 5.13 2840269 1.89 3194330 6.18 919 3276288 2.50 3354918912 3012436481 10.21 13.33 10.40 3.54 1.06 2048 676 15128546 12.85 10387246 3.19 4741300 9.17 1451 4837888 2.00 9907994624 7046950572 28.88 31.19 30.70 29.57 8.87 3072 60 3569263 3.03 3199974 9.67 369289 0.71 2517 425216 13.15 1306263552 999417731 23.49 4.42 4.05 3.17 0.95 4096 17 2263401 1.92 2190202 30.92 73199 0.14 3553 120064 39.03 491782144 162826245 66.89 0.72 1.52 3.40 1.02 8192 164 27123515 23.04 26443023 39.86 680492 1.32 7272 1160704 41.37 9508487168 5274228224 44.53 23.35 29.47 43.76 13.12 ------------------------- BLOBS ------------------------- Bucket(K) Allocations Allocated Deletes Deleted Current Data Mean Churn 16 10450907 117664886373 10323488 116174985131 127419 1489901242 11258 82.02 32 1553589 28843154327 1533669 28437970383 19920 405183944 18565 77.99 64 4 131819 4 159982 0 -28163 32954 4.00 128 0 0 1 67780 -1 -67780 0 0.00 256 0 0 0 0 0 0 0 0.00 512 0 0 0 0 0 0 0 0.00 1024 0 0 0 0 0 0 0 0.00 2048 0 0 0 0 0 0 0 0.00 4096 0 0 0 0 0 0 0 0.00 8192 0 0 0 0 0 0 0 0.00 16384 0 0 0 0 0 0 0 0.00 32768 0 0 0 0 0 0 0 0.00 65536 0 0 0 0 0 0 0 0.00 2097151 0 0 0 0 0 0 0 0.00 Checking regions.....okay Note that attempting to dump the index pages causes a known concurrent modification issue where the dump journal logic is not properly isolated. See BLZG-847 (DumpJournal does not protect against concurrent updates (NSS)).
        Hide
        bryanthompson bryanthompson added a comment -

        The UPDATE workloads are continuing to execute against the kb and kb2 namespaces. I am going to prompt a few failovers of services and see if any interesting problems emerge. Starting at commitCounter=783422.

        I've worked through several follower fail and a few leader fail events on the HA3 group commit cluster. I did see some liveness issues around the quorum at one point which was fixed by restarting the service on bigdata15. I suspect that this might have been an RMI related issue. It was certainly not anything related to the group commit mechanisms.

        At this point the HA3 cluster with group commit is up to commitCounter=788983, HALogs=193577, 1 snapshot @ commitCounter=595411. It all seems quite stable.

        @martyncutcher is going to run the branch through CI and look over the HA test suite results in CI. If that is good, then we can close out this ticket and bring the GROUP_COMMIT_1136 branch back to master.

        Show
        bryanthompson bryanthompson added a comment - The UPDATE workloads are continuing to execute against the kb and kb2 namespaces. I am going to prompt a few failovers of services and see if any interesting problems emerge. Starting at commitCounter=783422. I've worked through several follower fail and a few leader fail events on the HA3 group commit cluster. I did see some liveness issues around the quorum at one point which was fixed by restarting the service on bigdata15. I suspect that this might have been an RMI related issue. It was certainly not anything related to the group commit mechanisms. At this point the HA3 cluster with group commit is up to commitCounter=788983, HALogs=193577, 1 snapshot @ commitCounter=595411. It all seems quite stable. @martyncutcher is going to run the branch through CI and look over the HA test suite results in CI. If that is good, then we can close out this ticket and bring the GROUP_COMMIT_1136 branch back to master.
        Hide
        bryanthompson bryanthompson added a comment -

        I've reduced the stress levels for the HA group commit test suite and removed a few of the "large load" tests in that suite and queued the branch for CI to see if we can get a clean run.

        Show
        bryanthompson bryanthompson added a comment - I've reduced the stress levels for the HA group commit test suite and removed a few of the "large load" tests in that suite and queued the branch for CI to see if we can get a clean run.
        Hide
        bryanthompson bryanthompson added a comment -

        CI is running clean with that reduction in stress levels. I am going to merge the branch for this ticket to master and close the ticket.

        Commit 440d90c717ecdceff695e679c5e939de51126484

        Show
        bryanthompson bryanthompson added a comment - CI is running clean with that reduction in stress levels. I am going to merge the branch for this ticket to master and close the ticket. Commit 440d90c717ecdceff695e679c5e939de51126484

          People

          • Assignee:
            martyncutcher martyncutcher
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: