Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-613

Empty chunk in ThickChunkMessage (cluster)

    Details

      Description

      The code which maps the bindings over the shards can produce an empty chunk under some circumstances, which results in this thrown exception.

      java.lang.IllegalArgumentException
              at com.bigdata.bop.fed.ThickChunkMessage.<init>(ThickChunkMessage.java:153)
              at com.bigdata.bop.fed.FederationChunkHandler.sendChunkMessage(FederationChunkHandler.java:501)
              at com.bigdata.bop.fed.FederationChunkHandler.handleChunk(FederationChunkHandler.java:322)
              at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.outputChunk(ChunkedRunningQuery.java:1500)
              at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.add(ChunkedRunningQuery.java:1482)
              at com.bigdata.bop.engine.ChunkedRunningQuery$HandleChunkBuffer.add(ChunkedRunningQuery.java:1338)
              at com.bigdata.relation.accesspath.UnsyncLocalOutputBuffer.handleChunk(UnsyncLocalOutputBuffer.java:59)
              at com.bigdata.relation.accesspath.UnsyncLocalOutputBuffer.handleChunk(UnsyncLocalOutputBuffer.java:14)
              at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuffer.overflow(AbstractUnsynchronizedArrayBuffer.java:256)
              at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuffer.add2(AbstractUnsynchronizedArrayBuffer.java:187)
              at com.bigdata.relation.accesspath.AbstractUnsynchronizedArrayBuffer.add(AbstractUnsynchronizedArrayBuffer.java:146)
      

      This can be fixed either by sending the empty chunk or by dropping it (my preference). The hard constraint is that we must count such chunks if we send them and must not count them (in messagesSent) if we do not, otherwise the query will not terminate correctly (messages out from the operator will not be balanced by messages consumed).

      This looks like an easy fix in FederationChunkHandler at around line 501. If the chunk is empty, log an warning/error and then drop it. Only non-empty chunks should be sent and counted.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Added a non-blocking variant of AbstractRunningQuery#getRunState(). This uses a "barge" pattern to report the internal RunState of an operator in a query when the lock is not currently held and otherwise returns null immediately. This was added to prevent lock out of the detailed STATUS page for a query as generated by QueryLog while a query is running. Such lock outs appear to be correlated with heavy heap pressure on the query controller. Therefore some latency might still be observed, but it could be that the lock out was due to blocking while trying to send a chunk for a "last pass" evaluation of some operator while holding the lock. EXPLAIN will always paint the accurate and final RunState since the lock should not be held when the query is done.

        Added unit test for ThickChunkMessage to verify correct rejection of an empty chunk.

        Modified FederationChunkHandler per [1] to NOT send an empty chunk when mapping chunks over shards. (I am still not sure how these empty chunks are being produced, but there are several TODOs surrounding optimization and refactoring of the code to map bindings sets over shards so I am going to defer further work on this until that refactoring gets picked up.)

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/492 (Empty chunk in ThickChunkMessage (cluster))

        Committed revision r6042.

        Show
        bryanthompson bryanthompson added a comment - Added a non-blocking variant of AbstractRunningQuery#getRunState(). This uses a "barge" pattern to report the internal RunState of an operator in a query when the lock is not currently held and otherwise returns null immediately. This was added to prevent lock out of the detailed STATUS page for a query as generated by QueryLog while a query is running. Such lock outs appear to be correlated with heavy heap pressure on the query controller. Therefore some latency might still be observed, but it could be that the lock out was due to blocking while trying to send a chunk for a "last pass" evaluation of some operator while holding the lock. EXPLAIN will always paint the accurate and final RunState since the lock should not be held when the query is done. Added unit test for ThickChunkMessage to verify correct rejection of an empty chunk. Modified FederationChunkHandler per [1] to NOT send an empty chunk when mapping chunks over shards. (I am still not sure how these empty chunks are being produced, but there are several TODOs surrounding optimization and refactoring of the code to map bindings sets over shards so I am going to defer further work on this until that refactoring gets picked up.) [1] https://sourceforge.net/apps/trac/bigdata/ticket/492 (Empty chunk in ThickChunkMessage (cluster)) Committed revision r6042.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: