The HandleChunkBuffer class provides a (significant) throughput oriented optimization. However, that optimization can cause the order of the solutions to be changed. This means the HandleChunkBuffer class breaks the semantics for ORDER BY and SLICE, both of which should preserve order. It also breaks the semantics for ORDER_BY + DISTINCT. This problem only appears in cases where the size of the output chunks was such that a (sufficiently) full chunk would be output ahead of smaller chunks in an internal buffer.
This is a long standing issue. It was identified while working on cutoff join evaluation for the RTO (BLZG-265).
Note: Practically speaking, this issue only effects the output chunking of an operator and only can cause reordering when some small output chunks are followed by full chunks. In fact, this case will never arise with either ORDER_BY. The ORDER_BY operator will flush all outputs at once and will emit full sized chunks. However, it could arise with ORDER_BY + DISTINCT since the DISTINCT operator can filter solutions and then reduce input chunks having duplicate solutions from full chunks to small chunks. SLICE never reduces the number of solutions and so ORDER_BY + SLICE would never show this problem. However, ORDER_BY+DISTINCT+SLICE could show the problem.
The problem would also be manifest as an inability to reliably preserve the implicit order of an index scan through a query. However, bigdata does not really support this. In order to do this reliably, a query hint would need to be developed to instruct the query generator to create an order preserving plan. This is certainly possible and it could reuse much of the code that allows the RTO (BLZG-265) to perform cutoff evaluation of a join.