Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-600

Logged errors for sendHaltMessage() or sendChunk() on cluster.

    Details

      Description

      Exceptions such as the following can appear on a cluster when a limit is used. The log message is more likely for longer running queries because there are more outstanding operators trying to send a halt message and more outstanding operators trying to flow chunks across the federation. The root cause is the LIMIT causing the query to be cancelled on the query controller. When the query is cancelled, all running operators are terminated on all nodes where the query has started. This causes interrupts to be propagated to the various threads.

      The correct thing is to test for an interrupt as the root cause and then ignore the exception.

      There may be an interaction with River 2.2. It appears that River 2.2 may fail to propagate the root cause of an interrupt in some cases based on an email thread on the river-dev mailing list. I have a patch for river 2.2 for this (propagating the root cause of an interrupt) which we can use to test cases where an IOException is being thrown and the suspected root cause is an interrupt, but the interrupt is not being made visible.

      This issue may have been partly masked by [1].

      [2] may be a related issue.

      ERROR [com.bigdata.service.jini.JiniFederation.executorService161] com.bigdata.bop.engine.ChunkedRunningQuery$SendHaltM                         essageTask.run(ChunkedRunningQuery.java:1655) 2012-02-09 15:16:23,222 - Could not notify query controller: java.rmi.MarshalException: error marshalling arguments; nested exception is:
              java.io.IOException: I/O error writing to mux connection: java.nio.channels.ClosedByInterruptException
      java.rmi.MarshalException: error marshalling arguments; nested exception is:
              java.io.IOException: I/O error writing to mux connection: java.nio.channels.ClosedByInterruptException
              at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethodOnce(BasicInvocationHandler.java:785)
              at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethod(BasicInvocationHandler.java:659)
              at net.jini.jeri.BasicInvocationHandler.invoke(BasicInvocationHandler.java:528)
              at $Proxy8.haltOp(Unknown Source)
              at com.bigdata.bop.engine.ChunkedRunningQuery$SendHaltMessageTask.run(ChunkedRunningQuery.java:1651)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.IOException: I/O error writing to mux connection: java.nio.channels.ClosedByInterruptException
              at com.sun.jini.jeri.internal.mux.Session$MuxOutputStream.ensureOpen(Session.java:648)
              at com.sun.jini.jeri.internal.mux.Session$MuxOutputStream.write(Session.java:589)
              at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1847)
              at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1756)
              at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1257)
              at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1211)
              at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1395)
              at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
              at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1547)
              at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:333)
              at com.sun.jini.jeri.internal.runtime.Util.marshalValue(Util.java:176)
              at net.jini.jeri.BasicInvocationHandler.marshalArguments(BasicInvocationHandler.java:1182)
              at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethodOnce(BasicInvocationHandler.java:778)
              ... 7 more
      Caused by: java.nio.channels.ClosedByInterruptException
              at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
              at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:374)
              at java.nio.channels.SocketChannel.write(SocketChannel.java:360)
              at com.sun.jini.jeri.internal.mux.SocketChannelConnectionIO.asyncSend(SocketChannelConnectionIO.java:156)
              at com.sun.jini.jeri.internal.mux.Mux.asyncSendData(Mux.java:653)
              at com.sun.jini.jeri.internal.mux.Session$MuxOutputStream.writeBuffer(Session.java:749)
              at com.sun.jini.jeri.internal.mux.Session$MuxOutputStream.close(Session.java:622)
              at java.io.ObjectOutputStream$BlockDataOutputStream.close(ObjectOutputStream.java:1798)
              at java.io.ObjectOutputStream.close(ObjectOutputStream.java:722)
              at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethodOnce(BasicInvocationHandler.java:780)
              at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethod(BasicInvocationHandler.java:659)
              at net.jini.jeri.BasicInvocationHandler.invoke(BasicInvocationHandler.java:528)
              at $Proxy8.startOp(Unknown Source)
              at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTaskWrapper.run(ChunkedRunningQuery.java:760)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63)
              at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run(ChunkedRunningQuery.java:675)
              ... 3 more
      

      [1] http://sourceforge.net/apps/trac/bigdata/ticket/478 (Cluster does not map input solution(s) across shards)
      [2] https://sourceforge.net/apps/trac/bigdata/ticket/468 (Rare interrupt of rangeCount during query on cluster)

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Improved the try/catch logic.

        Committed Revision r6006

        Show
        bryanthompson bryanthompson added a comment - Improved the try/catch logic. Committed Revision r6006
        Hide
        bryanthompson bryanthompson added a comment -

        This exception does not appear in the BSBM mix up to a concurrency level of at least 32 query threads. Either it is restricted to longer running queries or the change set above has resolved the issue. Further testing with LIMIT on long running pipelined queries (such as LUBM Q9 or Q2) is necessary before this issue can be closed.

        Show
        bryanthompson bryanthompson added a comment - This exception does not appear in the BSBM mix up to a concurrency level of at least 32 query threads. Either it is restricted to longer running queries or the change set above has resolved the issue. Further testing with LIMIT on long running pipelined queries (such as LUBM Q9 or Q2) is necessary before this issue can be closed.
        Hide
        bryanthompson bryanthompson added a comment -

        Here is another trace processing while resolving IVs to RDF Values. This should be trapped and ignored. In fact, it was being trapped and ignored at:

                at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:237)
        

        The problem is that we are logging out the exception rather than recognizing it as an interrupt at:

                at com.bigdata.service.ndx.ClientIndexView.runParallel(ClientIndexView.java:1742)
        

        I have added a FIXME at that point in ClientIndexView and linked it to this issue.

        The full trace is below:

        ERROR [com.bigdata.service.jini.JiniFederation.executorService4856] com.bigdata.service.ndx.ClientIndexView.runParallel(ClientIndexView.java:1750) 2012-02-10 19:17:26,068 - Execution failed: task=index=BSBM200M.lex.ID2TERM, ts=1328905044940, procedure com.bigdata.btree.proc.BatchLookup : Split{ ntuples=1, fromIndex=0, toIndex=1, pmd={ partitionId=23, dataServiceUUID=0662489a-5002-40ee-846a-508b0560d7fb, leftSeparator=[128,40,0,0,0,36,0,21,223], rightSeparator=[128,56,0,0,0,32,0,58,54]}}
        java.util.concurrent.ExecutionException: java.rmi.UnmarshalException: exception unmarshalling response; nested exception is:
                java.io.IOException: request I/O interrupted
                at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
                at java.util.concurrent.FutureTask.get(FutureTask.java:83)
                at com.bigdata.service.ndx.ClientIndexView.runParallel(ClientIndexView.java:1742)
                at com.bigdata.service.ndx.ClientIndexView.runTasks(ClientIndexView.java:1656)
                at com.bigdata.service.ndx.ClientIndexView.submit(ClientIndexView.java:1602)
                at com.bigdata.service.ndx.ClientIndexView.submit(ClientIndexView.java:1514)
                at com.bigdata.rdf.lexicon.ResolveTermTask.call(ResolveTermTask.java:86)
                at com.bigdata.rdf.lexicon.BatchResolveTermIVsTask.call(BatchResolveTermIVsTask.java:96)
                at com.bigdata.rdf.lexicon.BatchResolveTermIVsTask.call(BatchResolveTermIVsTask.java:27)
                at com.bigdata.rdf.lexicon.LexiconRelation.getTerms(LexiconRelation.java:2140)
                at com.bigdata.rdf.store.BigdataBindingSetResolverator.resolveChunk(BigdataBindingSetResolverator.java:219)
                at com.bigdata.rdf.store.BigdataBindingSetResolverator.resolveChunk(BigdataBindingSetResolverator.java:108)
                at com.bigdata.rdf.store.BigdataBindingSetResolverator.resolveChunk(BigdataBindingSetResolverator.java:36)
                at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:237)
                at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:194)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
        Caused by: java.rmi.UnmarshalException: exception unmarshalling response; nested exception is:
                java.io.IOException: request I/O interrupted
                at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethodOnce(BasicInvocationHandler.java:847)
                at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethod(BasicInvocationHandler.java:659)
                at net.jini.jeri.BasicInvocationHandler.invoke(BasicInvocationHandler.java:528)
                at $Proxy4.submit(Unknown Source)
                at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.submit(AbstractDataServiceProcedureTask.java:348)
                at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.submit(AbstractDataServiceProcedureTask.java:292)
                at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.call(AbstractDataServiceProcedureTask.java:215)
                at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.call(AbstractDataServiceProcedureTask.java:45)
                ... 5 more
        Caused by: java.io.IOException: request I/O interrupted
                at com.sun.jini.jeri.internal.mux.Session$MuxInputStream.read(Session.java:833)
                at net.jini.jeri.connection.ConnectionManager$Outbound$Input.read(ConnectionManager.java:550)
                at net.jini.jeri.BasicObjectEndpoint.executeCall(BasicObjectEndpoint.java:410)
                at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethodOnce(BasicInvocationHandler.java:806)
                ... 12 more
        Caused by: java.lang.InterruptedException
                at java.lang.Object.wait(Native Method)
                at java.lang.Object.wait(Object.java:485)
                at com.sun.jini.jeri.internal.mux.Session$MuxInputStream.read(Session.java:829)
                ... 15 more
        
        Show
        bryanthompson bryanthompson added a comment - Here is another trace processing while resolving IVs to RDF Values. This should be trapped and ignored. In fact, it was being trapped and ignored at: at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:237) The problem is that we are logging out the exception rather than recognizing it as an interrupt at: at com.bigdata.service.ndx.ClientIndexView.runParallel(ClientIndexView.java:1742) I have added a FIXME at that point in ClientIndexView and linked it to this issue. The full trace is below: ERROR [com.bigdata.service.jini.JiniFederation.executorService4856] com.bigdata.service.ndx.ClientIndexView.runParallel(ClientIndexView.java:1750) 2012-02-10 19:17:26,068 - Execution failed: task=index=BSBM200M.lex.ID2TERM, ts=1328905044940, procedure com.bigdata.btree.proc.BatchLookup : Split{ ntuples=1, fromIndex=0, toIndex=1, pmd={ partitionId=23, dataServiceUUID=0662489a-5002-40ee-846a-508b0560d7fb, leftSeparator=[128,40,0,0,0,36,0,21,223], rightSeparator=[128,56,0,0,0,32,0,58,54]}} java.util.concurrent.ExecutionException: java.rmi.UnmarshalException: exception unmarshalling response; nested exception is: java.io.IOException: request I/O interrupted at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at com.bigdata.service.ndx.ClientIndexView.runParallel(ClientIndexView.java:1742) at com.bigdata.service.ndx.ClientIndexView.runTasks(ClientIndexView.java:1656) at com.bigdata.service.ndx.ClientIndexView.submit(ClientIndexView.java:1602) at com.bigdata.service.ndx.ClientIndexView.submit(ClientIndexView.java:1514) at com.bigdata.rdf.lexicon.ResolveTermTask.call(ResolveTermTask.java:86) at com.bigdata.rdf.lexicon.BatchResolveTermIVsTask.call(BatchResolveTermIVsTask.java:96) at com.bigdata.rdf.lexicon.BatchResolveTermIVsTask.call(BatchResolveTermIVsTask.java:27) at com.bigdata.rdf.lexicon.LexiconRelation.getTerms(LexiconRelation.java:2140) at com.bigdata.rdf.store.BigdataBindingSetResolverator.resolveChunk(BigdataBindingSetResolverator.java:219) at com.bigdata.rdf.store.BigdataBindingSetResolverator.resolveChunk(BigdataBindingSetResolverator.java:108) at com.bigdata.rdf.store.BigdataBindingSetResolverator.resolveChunk(BigdataBindingSetResolverator.java:36) at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:237) at com.bigdata.striterator.AbstractChunkedResolverator$ChunkConsumerTask.call(AbstractChunkedResolverator.java:194) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.rmi.UnmarshalException: exception unmarshalling response; nested exception is: java.io.IOException: request I/O interrupted at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethodOnce(BasicInvocationHandler.java:847) at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethod(BasicInvocationHandler.java:659) at net.jini.jeri.BasicInvocationHandler.invoke(BasicInvocationHandler.java:528) at $Proxy4.submit(Unknown Source) at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.submit(AbstractDataServiceProcedureTask.java:348) at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.submit(AbstractDataServiceProcedureTask.java:292) at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.call(AbstractDataServiceProcedureTask.java:215) at com.bigdata.service.ndx.AbstractDataServiceProcedureTask.call(AbstractDataServiceProcedureTask.java:45) ... 5 more Caused by: java.io.IOException: request I/O interrupted at com.sun.jini.jeri.internal.mux.Session$MuxInputStream.read(Session.java:833) at net.jini.jeri.connection.ConnectionManager$Outbound$Input.read(ConnectionManager.java:550) at net.jini.jeri.BasicObjectEndpoint.executeCall(BasicObjectEndpoint.java:410) at net.jini.jeri.BasicInvocationHandler.invokeRemoteMethodOnce(BasicInvocationHandler.java:806) ... 12 more Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at com.sun.jini.jeri.internal.mux.Session$MuxInputStream.read(Session.java:829) ... 15 more
        Hide
        bryanthompson bryanthompson added a comment -

        Closed. Not relevant for the new scale-out architecture.

        Show
        bryanthompson bryanthompson added a comment - Closed. Not relevant for the new scale-out architecture.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: