Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Affects Version/s: BIGDATA_RELEASE_1_3_1
    • Fix Version/s: None
    • Component/s: Other

      Description

      Using the 36 eg triples:

         eg:a eg:p eg:a, eg:b, eg:c, eg:d, eg:e, eg:f .
          eg:b eg:p eg:a, eg:b, eg:c, eg:d, eg:e, eg:f .
         eg:c eg:p eg:a, eg:b, eg:c, eg:d, eg:e, eg:f .
          eg:d eg:p eg:a, eg:b, eg:c, eg:d, eg:e, eg:f .
          eg:e eg:p eg:a, eg:b, eg:c, eg:d, eg:e, eg:f .
         eg:f eg:p eg:a, eg:b, eg:c, eg:d, eg:e, eg:f . 
      

      Ask either

      Prefix eg: <eg:>
      ASK
      FROM eg:g
      { BIND (1 as ?t)
        ?a eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p ?b
      }
      
      

      or

      Prefix eg: <eg:>
      SELET *
      FROM eg:g
      { BIND (1 as ?t)
        FILTER EXISTS {
          ?a eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p ?b
       }
      }
      

      The former takes less than 10 ms, the latter more than 5s.
      This suggests that the execution of FILTER EXISTS is not terminating when the first solution is found.

      See BLZG-1049 (Query hint not recognized in FILTER) for a related issue.

        Activity

        Hide
        jeremycarroll jeremycarroll added a comment -

        The heart of the filter exists get transformed into:

        QueryType: ASK
            SELECT VarNode(-exists-1)[anonymous]
              JoinGroupNode {
                StatementPatternNode(VarNode(a), ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-8584db47-500b-4e7b-b7d7-44e57d1e3911)[anonymous]) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
                StatementPatternNode(VarNode(--pp-anon-8584db47-500b-4e7b-b7d7-44e57d1e3911)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-7aa5c5fb-b533-485a-ae7b-a21b36f89593)[anonymous]) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
                StatementPatternNode(VarNode(--pp-anon-7aa5c5fb-b533-485a-ae7b-a21b36f89593)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-0173ab51-84fe-449b-bb87-5b28133a9eef)[anonymous]) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
                StatementPatternNode(VarNode(--pp-anon-0173ab51-84fe-449b-bb87-5b28133a9eef)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-413877e5-12ee-4afd-b43c-a910e96043de)[anonymous]) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
                StatementPatternNode(VarNode(--pp-anon-413877e5-12ee-4afd-b43c-a910e96043de)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-c9b3181c-8b8d-4d2f-b0aa-5cc4fa6801a4)[anonymous]) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
                StatementPatternNode(VarNode(--pp-anon-c9b3181c-8b8d-4d2f-b0aa-5cc4fa6801a4)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-b5b4b5ac-5846-4892-b19c-a820e30f626e)[anonymous]) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
                StatementPatternNode(VarNode(--pp-anon-b5b4b5ac-5846-4892-b19c-a820e30f626e)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-550e2845-ef09-493b-9916-a8fb37150782)[anonymous]) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
                StatementPatternNode(VarNode(--pp-anon-550e2845-ef09-493b-9916-a8fb37150782)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(b)) [scope=DEFAULT_CONTEXTS]
                  AST2BOpBase.estimatedCardinality=36
                  AST2BOpBase.originalIndex=POCS
              }
        

        The heart of the ASK is transformed into:

        QueryType: ASK
        includeInferred=true
          JoinGroupNode {
            ( ConstantNode(XSDInteger(1)) AS VarNode(t) )
            StatementPatternNode(VarNode(a), ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-ab51e49b-8e81-4dbc-bf6d-3be3a44fd146)[anonymous]) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
            StatementPatternNode(VarNode(--pp-anon-ab51e49b-8e81-4dbc-bf6d-3be3a44fd146)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-8d8daaac-2658-412f-bd7d-440e9c29d952)[anonymous]) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
            StatementPatternNode(VarNode(--pp-anon-8d8daaac-2658-412f-bd7d-440e9c29d952)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-f927d692-04b5-4ccc-8fc5-ede89dc3aa69)[anonymous]) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
            StatementPatternNode(VarNode(--pp-anon-f927d692-04b5-4ccc-8fc5-ede89dc3aa69)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-6f8db661-e348-49a9-803a-d82d1ac4811d)[anonymous]) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
            StatementPatternNode(VarNode(--pp-anon-6f8db661-e348-49a9-803a-d82d1ac4811d)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-99b65026-1acd-4dbb-a932-03f2859e747f)[anonymous]) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
            StatementPatternNode(VarNode(--pp-anon-99b65026-1acd-4dbb-a932-03f2859e747f)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-f6a3ebc1-3289-468f-8f84-52d506f4c2dd)[anonymous]) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
            StatementPatternNode(VarNode(--pp-anon-f6a3ebc1-3289-468f-8f84-52d506f4c2dd)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-9c999ea2-8586-477e-9367-658f93654d7f)[anonymous]) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
            StatementPatternNode(VarNode(--pp-anon-9c999ea2-8586-477e-9367-658f93654d7f)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(b)) [scope=DEFAULT_CONTEXTS]
              AST2BOpBase.estimatedCardinality=36
              AST2BOpBase.originalIndex=POCS
          }
        slice(limit=1)
        
        Show
        jeremycarroll jeremycarroll added a comment - The heart of the filter exists get transformed into: QueryType: ASK SELECT VarNode(-exists-1)[anonymous] JoinGroupNode { StatementPatternNode(VarNode(a), ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-8584db47-500b-4e7b-b7d7-44e57d1e3911)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-8584db47-500b-4e7b-b7d7-44e57d1e3911)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-7aa5c5fb-b533-485a-ae7b-a21b36f89593)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-7aa5c5fb-b533-485a-ae7b-a21b36f89593)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-0173ab51-84fe-449b-bb87-5b28133a9eef)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-0173ab51-84fe-449b-bb87-5b28133a9eef)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-413877e5-12ee-4afd-b43c-a910e96043de)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-413877e5-12ee-4afd-b43c-a910e96043de)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-c9b3181c-8b8d-4d2f-b0aa-5cc4fa6801a4)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-c9b3181c-8b8d-4d2f-b0aa-5cc4fa6801a4)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-b5b4b5ac-5846-4892-b19c-a820e30f626e)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-b5b4b5ac-5846-4892-b19c-a820e30f626e)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-550e2845-ef09-493b-9916-a8fb37150782)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-550e2845-ef09-493b-9916-a8fb37150782)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(b)) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS } The heart of the ASK is transformed into: QueryType: ASK includeInferred=true JoinGroupNode { ( ConstantNode(XSDInteger(1)) AS VarNode(t) ) StatementPatternNode(VarNode(a), ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-ab51e49b-8e81-4dbc-bf6d-3be3a44fd146)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-ab51e49b-8e81-4dbc-bf6d-3be3a44fd146)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-8d8daaac-2658-412f-bd7d-440e9c29d952)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-8d8daaac-2658-412f-bd7d-440e9c29d952)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-f927d692-04b5-4ccc-8fc5-ede89dc3aa69)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-f927d692-04b5-4ccc-8fc5-ede89dc3aa69)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-6f8db661-e348-49a9-803a-d82d1ac4811d)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-6f8db661-e348-49a9-803a-d82d1ac4811d)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-99b65026-1acd-4dbb-a932-03f2859e747f)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-99b65026-1acd-4dbb-a932-03f2859e747f)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-f6a3ebc1-3289-468f-8f84-52d506f4c2dd)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-f6a3ebc1-3289-468f-8f84-52d506f4c2dd)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(--pp-anon-9c999ea2-8586-477e-9367-658f93654d7f)[anonymous]) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS StatementPatternNode(VarNode(--pp-anon-9c999ea2-8586-477e-9367-658f93654d7f)[anonymous], ConstantNode(TermId(112600U)[eg:p]), VarNode(b)) [scope=DEFAULT_CONTEXTS] AST2BOpBase.estimatedCardinality=36 AST2BOpBase.originalIndex=POCS } slice(limit=1)
        Hide
        bryanthompson bryanthompson added a comment -

        The basic tradeoff is either:

        (1) Run one sub-query per source solution, in which case we can use LIMIT 1.

        (2) Run a sub-plan in which all source solutions are used to build a hash index, we flood the solutions from the hash index into the sub-plan, and then the solutions surviving the sub-plan are hash joined back to the source solutions and surviving source solutions are passed on.

        Based on earlier experience, we know that (2) tends to be 100x faster than (1).

        Some thoughts:

        (A) In order to reduce the work for (2), we might try to improve the query analysis in order to identify the distinct (or reduced) set of solutions based on the variables that need to be in scope in the sub-plan. This could reduce the number of solutions flowing into the sub-plan and hence the work in the sub-plan. The basic analysis is, find the minimum set of projected variables that are required by the sub-plan (corresponding to the FILTER EXISTS), apply a DISTINCT filter on the projection of the source solutions as the first operation in the sub-plan to reduce the number of solutions flowing through the sub-plan. There should not be anything additional that needs to be done at the hash join of the sub-plan with the solutions in the hash index since no bindings from the FILTER EXIST sub-plan will be visible in the outer query.

        (B) Like (A), but somehow use sideways information passing to reduce the effort in the sub-plan. I have no concrete suggestions here.

        (C) Improve the query analysis to identify edge cases where either bottom-up evaluation of the FILTER EXISTS would be more efficient; or (ii) issuing one sub-query per source solution for the FILTER EXISTS would be more efficient.

        Of these, I would rank (A) as the most interesting option. Let me know if you want to try this approach or discuss what would be involved in more depth.

        Bryan

        Show
        bryanthompson bryanthompson added a comment - The basic tradeoff is either: (1) Run one sub-query per source solution, in which case we can use LIMIT 1. (2) Run a sub-plan in which all source solutions are used to build a hash index, we flood the solutions from the hash index into the sub-plan, and then the solutions surviving the sub-plan are hash joined back to the source solutions and surviving source solutions are passed on. Based on earlier experience, we know that (2) tends to be 100x faster than (1). Some thoughts: (A) In order to reduce the work for (2), we might try to improve the query analysis in order to identify the distinct (or reduced) set of solutions based on the variables that need to be in scope in the sub-plan. This could reduce the number of solutions flowing into the sub-plan and hence the work in the sub-plan. The basic analysis is, find the minimum set of projected variables that are required by the sub-plan (corresponding to the FILTER EXISTS), apply a DISTINCT filter on the projection of the source solutions as the first operation in the sub-plan to reduce the number of solutions flowing through the sub-plan. There should not be anything additional that needs to be done at the hash join of the sub-plan with the solutions in the hash index since no bindings from the FILTER EXIST sub-plan will be visible in the outer query. (B) Like (A), but somehow use sideways information passing to reduce the effort in the sub-plan. I have no concrete suggestions here. (C) Improve the query analysis to identify edge cases where either bottom-up evaluation of the FILTER EXISTS would be more efficient; or (ii) issuing one sub-query per source solution for the FILTER EXISTS would be more efficient. Of these, I would rank (A) as the most interesting option. Let me know if you want to try this approach or discuss what would be involved in more depth. Bryan
        Hide
        bryanthompson bryanthompson added a comment -

        As indicated off-list, we have existing code paths that support a non-vectored subquery per source solution for (NOT) EXISTS using the SubqueryOp.

        I have reenabled this code and verified that it passes the test suite (except for one detailed check of the physical operator plan that is generated for NOT EXISTS).

        I have also added a LIMIT ONE when using the sub-query version.

        However, I observe that the run time for the test described above is a constant .18 seconds regardless of which code path is used. In neither case do I observe the slow performance described on this ticket of 5 seconds. That is, the test case does not demonstrate the performance problem. Please check the committed test cases in TestNegation.java. See the test_exists_988a() and test_exists_988b() methods at the end of that file.

        The sub-query plan is currently OFF

            private static PipelineOp addExistsSubquery(PipelineOp left,
                    final SubqueryRoot subqueryRoot, final Set<IVariable<?>> doneSet,
                    final AST2BOpContext ctx) {
        
                if (true) {
                    // Vectored sub-plan evaluation.
                    return addExistsSubqueryFast(left, subqueryRoot, doneSet, ctx);
                } else {
                    // Non-vectored sub-query evaluation.
                    return addExistsSubquerySubquery(left, subqueryRoot, doneSet, ctx);
                }
                
            }
        

        I have not tested the use of a DISTINCT solutions operator in the sub-plan, but I have marked the code for where that operator should be introduced.

                 * FIXME EXISTS: Try DISTINCT in the sub-plan and compare to correctness
                 * without for (NOT) EXISTS and to performance of the non-vectored code
                 * path for EXISTS>
        

        Committed revision r8524.

        Show
        bryanthompson bryanthompson added a comment - As indicated off-list, we have existing code paths that support a non-vectored subquery per source solution for (NOT) EXISTS using the SubqueryOp. I have reenabled this code and verified that it passes the test suite (except for one detailed check of the physical operator plan that is generated for NOT EXISTS). I have also added a LIMIT ONE when using the sub-query version. However, I observe that the run time for the test described above is a constant .18 seconds regardless of which code path is used. In neither case do I observe the slow performance described on this ticket of 5 seconds. That is, the test case does not demonstrate the performance problem. Please check the committed test cases in TestNegation.java. See the test_exists_988a() and test_exists_988b() methods at the end of that file. The sub-query plan is currently OFF private static PipelineOp addExistsSubquery(PipelineOp left, final SubqueryRoot subqueryRoot, final Set<IVariable<?>> doneSet, final AST2BOpContext ctx) { if (true) { // Vectored sub-plan evaluation. return addExistsSubqueryFast(left, subqueryRoot, doneSet, ctx); } else { // Non-vectored sub-query evaluation. return addExistsSubquerySubquery(left, subqueryRoot, doneSet, ctx); } } I have not tested the use of a DISTINCT solutions operator in the sub-plan, but I have marked the code for where that operator should be introduced. * FIXME EXISTS: Try DISTINCT in the sub-plan and compare to correctness * without for (NOT) EXISTS and to performance of the non-vectored code * path for EXISTS> Committed revision r8524.
        Hide
        bryanthompson bryanthompson added a comment -

        Bug fixes to the test suite for BLZG-1048. I spoke with Jeremy and am now able to replicate the observed performance issue. I have also verified that changing from the vectored sub-plan to the subquery LIMIT ONE approach fixes the performance problem for this query.

        After that we can figure out if we can make the right decision automatically and also take a look at using DISTINCT over the variables in the FILTER in the vectored sub-plan to accelerate that code path.

        Committed revision r8525.

        The next steps are:
        1. Create query hint and document on wiki. Add test to verify correct plan generation.
        2. Fix problem where query hint inside of FILTER is not being translated. They are left in place and look like normal triples, causing the filters to fail.
        3. Examine the opportunity to use DISTINCT over the variables in the FILTER on the vectored sub-plan approach.

        Show
        bryanthompson bryanthompson added a comment - Bug fixes to the test suite for BLZG-1048 . I spoke with Jeremy and am now able to replicate the observed performance issue. I have also verified that changing from the vectored sub-plan to the subquery LIMIT ONE approach fixes the performance problem for this query. After that we can figure out if we can make the right decision automatically and also take a look at using DISTINCT over the variables in the FILTER in the vectored sub-plan to accelerate that code path. Committed revision r8525. The next steps are: 1. Create query hint and document on wiki. Add test to verify correct plan generation. 2. Fix problem where query hint inside of FILTER is not being translated. They are left in place and look like normal triples, causing the filters to fail. 3. Examine the opportunity to use DISTINCT over the variables in the FILTER on the vectored sub-plan approach.
        Hide
        bryanthompson bryanthompson added a comment -

        Added query hint "filterExists" to control the behavior. The default is still the VectoredSubPlan. You can specify the SubQueryLimitOne behavior like this:

        
        

        prefix eg: <http://www.bigdata.com/>

        SELECT ?t
        FROM eg:g
        { BIND (1 as ?t)
        FILTER EXISTS

        { ?a eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p ?b . hint:SubQuery hint:filterExists "SubQueryLimitOne" . # <<< OVERRIDE THE DEFAULT BEHAVIOR. }

        {noformat}}

        The unit test now verifies that the execution time is within target in order to verify that the query hint was correctly interpreted.

        I had to modify the ASTQueryHintOptimizer to flow the QueryRoot into the IQueryHint handler method. This caused all of the IQueryHint implementations to be touched. I also had to expose the method on StaticAnalysis to locate the FILTER for a join group by entering in through the QueryRoot.

        I fixed the problem where the ASTQueryHintOptimizer was not interpreting query hints in a FILTER.

        I have not tried to introduce the DISTINCT SOLUTIONS into the VectoredSubPlan.

        The core AST evaluation test suite is green.

        See BLZG-1048 (FILTER (NOT) EXISTS optimization)
        See BLZG-1049 (Query hint not recognized in FILTER)

        Committed revision r8526.

        Show
        bryanthompson bryanthompson added a comment - Added query hint "filterExists" to control the behavior. The default is still the VectoredSubPlan. You can specify the SubQueryLimitOne behavior like this: prefix eg: < http://www.bigdata.com/ > SELECT ?t FROM eg:g { BIND (1 as ?t) FILTER EXISTS { ?a eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p/eg:p ?b . hint:SubQuery hint:filterExists "SubQueryLimitOne" . # <<< OVERRIDE THE DEFAULT BEHAVIOR. } {noformat}} The unit test now verifies that the execution time is within target in order to verify that the query hint was correctly interpreted. I had to modify the ASTQueryHintOptimizer to flow the QueryRoot into the IQueryHint handler method. This caused all of the IQueryHint implementations to be touched. I also had to expose the method on StaticAnalysis to locate the FILTER for a join group by entering in through the QueryRoot. I fixed the problem where the ASTQueryHintOptimizer was not interpreting query hints in a FILTER. I have not tried to introduce the DISTINCT SOLUTIONS into the VectoredSubPlan. The core AST evaluation test suite is green. See BLZG-1048 (FILTER (NOT) EXISTS optimization) See BLZG-1049 (Query hint not recognized in FILTER) Committed revision r8526.
        Hide
        bryanthompson bryanthompson added a comment -

        I am going to close out this ticket pending feedback. The query hint should make it possible to control the query plan used by the FILTER (NOT) EXISTS.

        The following things were not resolved under this ticket:

        1. Automatically choosing the best plan (either VectoredSubPlan or SubqueryLimitOne).
        2. Using DISTINCT SOLUTIONS in the VectoredSubPlan to make that plan more efficient.
        Show
        bryanthompson bryanthompson added a comment - I am going to close out this ticket pending feedback. The query hint should make it possible to control the query plan used by the FILTER (NOT) EXISTS. The following things were not resolved under this ticket: Automatically choosing the best plan (either VectoredSubPlan or SubqueryLimitOne). Using DISTINCT SOLUTIONS in the VectoredSubPlan to make that plan more efficient.
        Hide
        jeremycarroll jeremycarroll added a comment -

        I have reviewed the tests, and added a couple more (harder ones), and they both show the desired performance improvement. Thanks

        Show
        jeremycarroll jeremycarroll added a comment - I have reviewed the tests, and added a couple more (harder ones), and they both show the desired performance improvement. Thanks
        Hide
        bryanthompson bryanthompson added a comment -

        Updated some unit tests of the expected behavior of an optimizer that were broken by BLZG-1048.

        Committed revision r8531.

        Show
        bryanthompson bryanthompson added a comment - Updated some unit tests of the expected behavior of an optimizer that were broken by BLZG-1048 . Committed revision r8531.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            jeremycarroll jeremycarroll
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: