Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-647

Optimizer should detect identical sub-selects

    Details

    • Type: New Feature
    • Status: Open
    • Resolution: Unresolved
    • Affects Version/s: BIGDATA_RELEASE_1_1_0
    • Fix Version/s: None
    • Component/s: Query Plan Generator
    • Labels:
      None

      Description

      As documented at [1], BSBM BI Q5 has two sub-selects which are identical. This case should be recognized by an AST optimizer and automatically lifted out into a named subquery.

      The optimizer must look for the same set (without regard to order) of joins and filters with the same inputs and then lift those joins and filters into a named subquery. The example here is BSBM BI Q5.

      Select ?country ?product (count(?review) As ?nrOfReviews)
      {
            ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
            ?review bsbm:reviewFor ?product ;
                    rev:reviewer ?reviewer .
            ?reviewer bsbm:country ?country .
      }
      Group By ?country ?product
      

      and

      Select ?country ?product (count(?review) As ?nrOfReviews)
      {
            ?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType4> .
            ?review bsbm:reviewFor ?product .
            ?review rev:reviewer ?reviewer .
            ?reviewer bsbm:country ?country .
      }
      Group By ?country ?product
      

      This appear to be slightly different only because the first uses an alternative syntax for the triple pattern. However, they are precisely the same and should be lifted out entirely and replaced by a single named subquery. Again, [1] shows exactly what this should look like.

      In this case, both sub-selects have precisely the same solution modifiers and the same projection. However, this optimization can in fact be done in cases where that is not true. We can simply run the result of the shared WHERE clause (or a common set of joins) into a named solution set and the INCLUDE the named solution set back into the query in order to pick up any additional joins or differences in the solution modifiers.

      [1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=NamedSubquery#BSBM_BI_Q5

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Note: You can always work around this issue using the

        {NamedSubquery}

        syntax as documented on the wiki [1].

        [1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=NamedSubquery

        Show
        bryanthompson bryanthompson added a comment - Note: You can always work around this issue using the {NamedSubquery} syntax as documented on the wiki [1] . [1] https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=NamedSubquery

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: