Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-945

COUNT(DISTINCT) returns no rows rather than ZERO.

    Details

      Description

      Using an arbitrary data set, the first query reports ?count:=0 (one solution) while the 2nd query reports no solutions:

      One solution (count:=0)

      select (count(?snippet) as ?count) {
                      ?snippet ?p "abcedefg" .
      }
      

      No solutions:

      select (count(distinct ?snippet) as ?count) {
                      ?snippet ?p "abcedefg" .
      }
      

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        The COUNT() is run by the PipelinedAggregationOp. The COUNT DISTINCT is run by the MemoryGroupByOp. The problem appears to be that the COUNT DISTINCT (MemoryGroupByOp) is not invoked unless at least one solution would flow to that operator. On the other hand, the COUNT (PipelinedAggregationOp) is marked as requiring "last pass evaluation" and is hence always invoked at least once.

        Show
        bryanthompson bryanthompson added a comment - The COUNT() is run by the PipelinedAggregationOp. The COUNT DISTINCT is run by the MemoryGroupByOp. The problem appears to be that the COUNT DISTINCT (MemoryGroupByOp) is not invoked unless at least one solution would flow to that operator. On the other hand, the COUNT (PipelinedAggregationOp) is marked as requiring "last pass evaluation" and is hence always invoked at least once.
        Hide
        bryanthompson bryanthompson added a comment -

        This is going to require an in depth analysis of the query engine semantics around isLastPassRequested() and the termination criteria for a query in order to provide an assurance that a non-pipelined operator (such as the MemoryGroupByOp) is guaranteed to be evaluated even if there are no upstream solutions. This is handled for pipelined operators through the LAST_PASS annotations. However, that annotation is not compatible with "at-once" operator evaluation, which is what is used by the MemoryGroupByOp.

        The key places to look in the code are:

        AST2BopUtility#3876, which is where the MemoryGroupByOp is constructed.

                    op = new MemoryGroupByOp(leftOrEmpty(left), NV.asMap(new NV[] {//
                                    new NV(BOp.Annotations.BOP_ID, bopId),//
                                    new NV(BOp.Annotations.EVALUATION_CONTEXT,
                                            BOpEvaluationContext.CONTROLLER),//
                                    new NV(PipelineOp.Annotations.PIPELINED, false),//
                                    new NV(PipelineOp.Annotations.MAX_MEMORY, 0),//
                                    new NV(GroupByOp.Annotations.GROUP_BY_STATE,
                                            groupByState), //
                                    new NV(GroupByOp.Annotations.GROUP_BY_REWRITE,
                                            groupByRewrite), //
                            }));
        

        The PipelineOp constructor, which is where we reject LAST_PASS for at-once operators:

            protected PipelineOp(final BOp[] args,
                    final Map<String, Object> annotations) {
        
                super(args, annotations);
        
                if (getMaxParallel() < 1)
                    throw new IllegalArgumentException(Annotations.MAX_PARALLEL + "="
                            + getMaxParallel());
        
                if (isLastPassRequested()) {
        
                    if (getMaxParallel() != 1)
                        throw new IllegalArgumentException(Annotations.MAX_PARALLEL + "="
                                + getMaxParallel());
        
                    if (!isPipelinedEvaluation())
                        throw new UnsupportedOperationException(Annotations.PIPELINED
                                + "=" + isPipelinedEvaluation());
        
                }
                
            }
        

        RunState#startQuery(), which is where we mark the operators that require last pass evaluation.

            synchronized
            public void startQuery(final IChunkMessage<?> msg) throws QueryTimeoutException {
        

        AbstractRunningQuery#triggerOperatorsAwaitingLastPass(), which is where we actually execute the last pass operators.

            private void triggerOperatorsAwaitingLastPass() {
        
                if (runState.getTotalLastPassRemainingCount() == 0) {
        
                    return;
                    
                }
                
                // Consider the operators which require last pass evaluation.
                for (Integer bopId : runState.getLastPassRequested()) {
        
                    if (runState.getOperatorRunState(bopId) == RunStateEnum.StartLastPass) {
        
                        @SuppressWarnings("rawtypes")
                        final Set doneOn = runState.getDoneOn(bopId);
        
                        if (log.isInfoEnabled())
                            log.info("Triggering last pass: " + bopId);
        
                        doLastPass(bopId, doneOn);
        
                    }
        
                }
        
            }
        

        I have committed a test suite which demonstrates the problem.

        Committed revision r8020.

        Show
        bryanthompson bryanthompson added a comment - This is going to require an in depth analysis of the query engine semantics around isLastPassRequested() and the termination criteria for a query in order to provide an assurance that a non-pipelined operator (such as the MemoryGroupByOp) is guaranteed to be evaluated even if there are no upstream solutions. This is handled for pipelined operators through the LAST_PASS annotations. However, that annotation is not compatible with "at-once" operator evaluation, which is what is used by the MemoryGroupByOp. The key places to look in the code are: AST2BopUtility#3876, which is where the MemoryGroupByOp is constructed. op = new MemoryGroupByOp(leftOrEmpty(left), NV.asMap(new NV[] {// new NV(BOp.Annotations.BOP_ID, bopId),// new NV(BOp.Annotations.EVALUATION_CONTEXT, BOpEvaluationContext.CONTROLLER),// new NV(PipelineOp.Annotations.PIPELINED, false),// new NV(PipelineOp.Annotations.MAX_MEMORY, 0),// new NV(GroupByOp.Annotations.GROUP_BY_STATE, groupByState), // new NV(GroupByOp.Annotations.GROUP_BY_REWRITE, groupByRewrite), // })); The PipelineOp constructor, which is where we reject LAST_PASS for at-once operators: protected PipelineOp(final BOp[] args, final Map<String, Object> annotations) { super(args, annotations); if (getMaxParallel() < 1) throw new IllegalArgumentException(Annotations.MAX_PARALLEL + "=" + getMaxParallel()); if (isLastPassRequested()) { if (getMaxParallel() != 1) throw new IllegalArgumentException(Annotations.MAX_PARALLEL + "=" + getMaxParallel()); if (!isPipelinedEvaluation()) throw new UnsupportedOperationException(Annotations.PIPELINED + "=" + isPipelinedEvaluation()); } } RunState#startQuery(), which is where we mark the operators that require last pass evaluation. synchronized public void startQuery(final IChunkMessage<?> msg) throws QueryTimeoutException { AbstractRunningQuery#triggerOperatorsAwaitingLastPass(), which is where we actually execute the last pass operators. private void triggerOperatorsAwaitingLastPass() { if (runState.getTotalLastPassRemainingCount() == 0) { return; } // Consider the operators which require last pass evaluation. for (Integer bopId : runState.getLastPassRequested()) { if (runState.getOperatorRunState(bopId) == RunStateEnum.StartLastPass) { @SuppressWarnings("rawtypes") final Set doneOn = runState.getDoneOn(bopId); if (log.isInfoEnabled()) log.info("Triggering last pass: " + bopId); doLastPass(bopId, doneOn); } } } I have committed a test suite which demonstrates the problem. Committed revision r8020.
        Hide
        bryanthompson bryanthompson added a comment -

        A workaround is shown below:

        select (count(distinct ?s) as ?count) {
          {
            ?s ?p "abcedefg" .
          } UNION {
            # add an empty solution to trigger the GROUP BY operator.
          }
        }
        

        This uses a UNION for ensure that there is at least an empty solution that flows into the implicit MemoryGroupByOp and hence triggers the aggregation. This correctly reports

        count:=0

        Show
        bryanthompson bryanthompson added a comment - A workaround is shown below: select (count(distinct ?s) as ?count) { { ?s ?p "abcedefg" . } UNION { # add an empty solution to trigger the GROUP BY operator. } } This uses a UNION for ensure that there is at least an empty solution that flows into the implicit MemoryGroupByOp and hence triggers the aggregation. This correctly reports count:=0
        Hide
        bryanthompson bryanthompson added a comment -

        Artem is taking a look at this.

        Show
        bryanthompson bryanthompson added a comment - Artem is taking a look at this.
        Hide
        bryanthompson bryanthompson added a comment -

        - The decision to run the at-once operator should include (as an extension of what already exists) that the operator will be triggered once no up-stream operator could trigger it (this is the typical last-pass evaluation decision) IFF the operator has not run (at at-once operator MUST NOT run twice).


        - Add unit test(s) to AbstractAggregationTest when there are no source solutions and verify the correct behavior of the operators.


        - TestRTO_BSBM.test_BSBM_Q5_pc100() fails (produces two solutions) with the proposed changes. This is because the "at-once" operator is executing twice. It's contract should be to execute once-and-only-once. (The bug is that it currently executes zero or one times.)

        Check the following test suites for SPARQL evaluation and the QueryEngine:
        - com.bigdata.rdf.sparql.ast.eval.TestAll
        - BigdataSparqlTest
        - com.bigdata.bop.TestAll

        Show
        bryanthompson bryanthompson added a comment - - The decision to run the at-once operator should include (as an extension of what already exists) that the operator will be triggered once no up-stream operator could trigger it (this is the typical last-pass evaluation decision) IFF the operator has not run (at at-once operator MUST NOT run twice). - Add unit test(s) to AbstractAggregationTest when there are no source solutions and verify the correct behavior of the operators. - TestRTO_BSBM.test_BSBM_Q5_pc100() fails (produces two solutions) with the proposed changes. This is because the "at-once" operator is executing twice. It's contract should be to execute once-and-only-once. (The bug is that it currently executes zero or one times.) Check the following test suites for SPARQL evaluation and the QueryEngine: - com.bigdata.rdf.sparql.ast.eval.TestAll - BigdataSparqlTest - com.bigdata.bop.TestAll
        Hide
        bryanthompson bryanthompson added a comment -

        Continued work on this ticket. We have analyzed the RunState for the at-once required operators and begun to introduce some new inner state to track the this condition. Artem will continue to carry this forward and also write a unit test for this case.

        Show
        bryanthompson bryanthompson added a comment - Continued work on this ticket. We have analyzed the RunState for the at-once required operators and begun to introduce some new inner state to track the this condition. Artem will continue to carry this forward and also write a unit test for this case.
        Hide
        bryanthompson bryanthompson added a comment -

        We have a clean run in CI. The branch has been merged back to the master. This ticket is closed. The fix will be in our 1.4.1 release.

        Show
        bryanthompson bryanthompson added a comment - We have a clean run in CI. The branch has been merged back to the master. This ticket is closed. The fix will be in our 1.4.1 release.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: