Details

      Description

      The HandleChunkBuffer class provides a (significant) throughput oriented optimization. However, that optimization can cause the order of the solutions to be changed. This means the HandleChunkBuffer class breaks the semantics for ORDER BY and SLICE, both of which should preserve order. It also breaks the semantics for ORDER_BY + DISTINCT. This problem only appears in cases where the size of the output chunks was such that a (sufficiently) full chunk would be output ahead of smaller chunks in an internal buffer.

      This is a long standing issue. It was identified while working on cutoff join evaluation for the RTO (BLZG-265).

      Note: Practically speaking, this issue only effects the output chunking of an operator and only can cause reordering when some small output chunks are followed by full chunks. In fact, this case will never arise with either ORDER_BY. The ORDER_BY operator will flush all outputs at once and will emit full sized chunks. However, it could arise with ORDER_BY + DISTINCT since the DISTINCT operator can filter solutions and then reduce input chunks having duplicate solutions from full chunks to small chunks. SLICE never reduces the number of solutions and so ORDER_BY + SLICE would never show this problem. However, ORDER_BY+DISTINCT+SLICE could show the problem.

      The problem would also be manifest as an inability to reliably preserve the implicit order of an index scan through a query. However, bigdata does not really support this. In order to do this reliably, a query hint would need to be developed to instruct the query generator to create an order preserving plan. This is certainly possible and it could reuse much of the code that allows the RTO (BLZG-265) to perform cutoff evaluation of a join.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        I missed some tests where the SliceOp constructor was invoked in the last commit. This should bring the build back to normal.

        See BLZG-265 (RTO)
        See BLZG-879 (Solution order not always preserved)

        Committed revision r7788.

        Show
        bryanthompson bryanthompson added a comment - I missed some tests where the SliceOp constructor was invoked in the last commit. This should bring the build back to normal. See BLZG-265 (RTO) See BLZG-879 (Solution order not always preserved) Committed revision r7788.
        Hide
        bryanthompson bryanthompson added a comment -

        The root cause of the OutOfOrderEvaluationException was the ChunkedRunningQuery.HandleChunkBuffer class. That class was allowing reordering of chunks in order to output full chunks immediately and gather smaller chunks together until they can be combined as a single full chunk. The class has been rewritten and now has two distinct behaviors. If reordering is allowed, then the old behavior is preserved (except that it can output chunks of up to 150% of the target chunk size). If reordering is disallowed, then it will still combine chunks as much as possible, but not if that would violate an order preserving guarantee.

        The HandleChunkBuffer class breaks the semantics for ORDER BY and SLICE, both of which should preserve order. It also breaks the semantics for ORDER_BY + DISTINCT. This problem would only appear in cases where the size of the output chunks was such that a (sufficiently) full chunk would be output ahead of smaller chunks in an internal buffer.

        I have added a new PipelineOp annotation named REORDER_SOLUTIONS. This defaults to true, which is the historical throughput-oriented behavior.

        The MemorySortOp and SliceOp constructors now check for and require REORDER_SOLUTIONS := false.

        AST2BOpUtility has been modified to turn off REORDER_SOLUTIONS for the ORDER_BY_DISTINCT case (see BLZG-667 (ORDER BY + DISTINCT)).

        The AST2BOpRTO integration now runs clean when
        failOutOfOrderEvaluation := true.

        PipelineOp
        - addded REORDER_SOLUTIONS annotation. Defaults to true (the historical throughput oriented behavior).

        MemorySortOp
        - requires REORDER_SOLUTIONS:=false.

        SliceOp
        - requires REORDER_SOLUTIONS:=false.

        Reordered solutions are no longer observed during cutoff join evaluation. The failure to disable the REORDER_SOLUTIONS annotation is now detected if checking of cutoff query plans is enabled.

        AST2BOpUtility
        - now adds the REORDER_SOLUTIONS:=false annotation as necessary for SLICE, ORDER_BY and DISTINCT (when paired with ORDER_BY).

        See BLZG-265 (RTO)
        See BLZG-879 (Solution order not always preserved)

        Committed revision r7787.

        Show
        bryanthompson bryanthompson added a comment - The root cause of the OutOfOrderEvaluationException was the ChunkedRunningQuery.HandleChunkBuffer class. That class was allowing reordering of chunks in order to output full chunks immediately and gather smaller chunks together until they can be combined as a single full chunk. The class has been rewritten and now has two distinct behaviors. If reordering is allowed, then the old behavior is preserved (except that it can output chunks of up to 150% of the target chunk size). If reordering is disallowed, then it will still combine chunks as much as possible, but not if that would violate an order preserving guarantee. The HandleChunkBuffer class breaks the semantics for ORDER BY and SLICE, both of which should preserve order. It also breaks the semantics for ORDER_BY + DISTINCT. This problem would only appear in cases where the size of the output chunks was such that a (sufficiently) full chunk would be output ahead of smaller chunks in an internal buffer. I have added a new PipelineOp annotation named REORDER_SOLUTIONS. This defaults to true, which is the historical throughput-oriented behavior. The MemorySortOp and SliceOp constructors now check for and require REORDER_SOLUTIONS := false. AST2BOpUtility has been modified to turn off REORDER_SOLUTIONS for the ORDER_BY_DISTINCT case (see BLZG-667 (ORDER BY + DISTINCT)). The AST2BOpRTO integration now runs clean when failOutOfOrderEvaluation := true. PipelineOp - addded REORDER_SOLUTIONS annotation. Defaults to true (the historical throughput oriented behavior). MemorySortOp - requires REORDER_SOLUTIONS:=false. SliceOp - requires REORDER_SOLUTIONS:=false. Reordered solutions are no longer observed during cutoff join evaluation. The failure to disable the REORDER_SOLUTIONS annotation is now detected if checking of cutoff query plans is enabled. AST2BOpUtility - now adds the REORDER_SOLUTIONS:=false annotation as necessary for SLICE, ORDER_BY and DISTINCT (when paired with ORDER_BY). See BLZG-265 (RTO) See BLZG-879 (Solution order not always preserved) Committed revision r7787.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: