Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-599

Cluster does not map input solution(s) across shards

    Details

      Description

      The cluster is not mapping the first access path across the data services when the first operator in the query is a sharded join.

      Query evaluation normally begins by injecting an empty solution into a ChunkedRunningQuery#startQuery(msg). However, due to an oversight, the initial solution(s) are not being mapped across the shards when the first operator is a sharded join. This results in the query controller using a global view of the index for the first access path, which means that the data flow through the query controller for that access path. Query still produces the correct solutions.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        QueryEngine#startEval(...) has the following code:

                // notify query start
                runningQuery.startQuery(msg);
                
                // tell query to consume the initial chunk.
                acceptChunk(msg);
        

        In fact, the problem is not with startQuery(msg) as that is just getting the RunState of the query setup. (This does increment the #of available messages, which might in fact be a problem if we turn one message into map messages mapped across the cluster).

        The problem is that the code directly invokes acceptChunk(msg) rather than mapping the initial chunk across the predicate for the next operator (assuming that the first operator in the query plan is a sharded join).

        One way to handle this is to insert a CopyOp as the first operator in the query plan on the cluster. This will ensure that the initial solution(s) are mapped because the output of the CopyOp will be mapped. That would also get around a possible fence post in RunState#startQuery().

        Show
        bryanthompson bryanthompson added a comment - QueryEngine#startEval(...) has the following code: // notify query start runningQuery.startQuery(msg); // tell query to consume the initial chunk. acceptChunk(msg); In fact, the problem is not with startQuery(msg) as that is just getting the RunState of the query setup. (This does increment the #of available messages, which might in fact be a problem if we turn one message into map messages mapped across the cluster). The problem is that the code directly invokes acceptChunk(msg) rather than mapping the initial chunk across the predicate for the next operator (assuming that the first operator in the query plan is a sharded join). One way to handle this is to insert a CopyOp as the first operator in the query plan on the cluster. This will ensure that the initial solution(s) are mapped because the output of the CopyOp will be mapped. That would also get around a possible fence post in RunState#startQuery().
        Hide
        bryanthompson bryanthompson added a comment -

        There was actually some disabled code to add a StartOp to the front of a query plan on a cluster. I enabled the code and documented it with reference to this issue.

        Committed revision r6002.

        Show
        bryanthompson bryanthompson added a comment - There was actually some disabled code to add a StartOp to the front of a query plan on a cluster. I enabled the code and documented it with reference to this issue. Committed revision r6002.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: