Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1119

problems with UNIONs + complex OPTIONAL groups

    Details

      Description

      I am running a complex query. Greatly simplified the part of interest here would ideally be expressed as:

      select *
      
        WHERE
        { BIND(1 as ?A)
          { BIND(2 as ?B) }
           UNION
          { BIND(3 as ?C) }
           UNION
          { }
      OPTIONAL { BIND( 'unbound' as ?D ) }
      OPTIONAL { BIND( 'unbound' as ?B ) }
      OPTIONAL { BIND( 'unbound' as ?C ) }
      } 
      
      

      This does not work (because of issues in the complex optimizer?)
      I have been using the following workaround code that uses named subqueries to have only one optional block per query

      select DISTINCT *
      WITH {
        SELECT *
        WHERE
        { BIND(1 as ?A)
          { BIND(2 as ?B) }
           UNION
          { BIND(3 as ?C) }
           UNION
          { }
        } 
      } AS %q
        
      WITH {
      SELECT *
      WHERE {
          INCLUDE %q
          # OPTIONAL { BIND( 'unbound' as ?D ) }
          OPTIONAL { BIND( 'unbound' as ?B ) }
      }} AS %_union_sub_q_1
      WITH {
      SELECT *
      WHERE {
          INCLUDE %_union_sub_q_1
          OPTIONAL { BIND( 'unbound' as ?C ) }
      }} AS %_union_sub_q_2
      WITH {
      SELECT *
      WHERE {
          INCLUDE %_union_sub_q_2
          # OPTIONAL { BIND( 'unbound' as ?B ) }
          OPTIONAL { BIND( 'unbound' as ?D ) }
      }} AS %__MainQuery
                
      WHERE {
        INCLUDE %__MainQuery
      }
      

      This gives the correct results, but note that the DISTINCT is needed despite being unnecessary in theory.
      Also note that swapping the order of the OPTIONAL (BINDs ) then gives incorrect results, i.e.

      select DISTINCT *
      WITH {
        SELECT *
        WHERE
        { BIND(1 as ?A)
          { BIND(2 as ?B) }
           UNION
          { BIND(3 as ?C) }
           UNION
          { }
        } 
      } AS %q
        
      WITH {
      SELECT *
      WHERE {
          INCLUDE %q
          OPTIONAL { BIND( 'unbound' as ?D ) }
          # OPTIONAL { BIND( 'unbound' as ?B ) }
      }} AS %_union_sub_q_1
      WITH {
      SELECT *
      WHERE {
          INCLUDE %_union_sub_q_1
          OPTIONAL { BIND( 'unbound' as ?C ) }
      }} AS %_union_sub_q_2
      WITH {
      SELECT *
      WHERE {
          INCLUDE %_union_sub_q_2
          OPTIONAL { BIND( 'unbound' as ?B ) }
          # OPTIONAL { BIND( 'unbound' as ?D ) }
      }} AS %__MainQuery
                
      WHERE {
        INCLUDE %__MainQuery
      }
      

      Includes a result row with both ?B and ?C bound.
      Please suggest a workaround that allows me to bind the literal 'unbound' to a list of variables that may or may not be bound after the execution of the query (which is actually a named subquery ....) [Or better fix the issue in the complex optimizer ...)

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        This is not a problem with the ASTComplexOptionalOptimizer. It appears to be an insufficiently correlated solution set hash join.

        Workaround: use an explicit triple (in the data) to bind the variable rather than the BIND(). This will cause the query generator to use the PipelineJoin for a "simple optional" rather than the "build-hash-index => run-subgroup => hash-join-solutions" pattern.

        Suspect root cause: the solution set hash join is not sufficiently correlated. Perhaps we need to inject a row identifier into the hash index and the hash join?

        To see the solutions trace, add the following to your log4j.properties file and uncomment the line as instructed. This will show the solutions flowing out of every operator during query execution.

        ## 
        # Solutions trace (tab delimited file).  Uncomment the next line to enable.
        #log4j.logger.com.bigdata.bop.engine.SolutionsLog=INFO,solutionsLog
        log4j.additivity.com.bigdata.bop.engine.SolutionsLog=false
        log4j.appender.solutionsLog=org.apache.log4j.ConsoleAppender
        #log4j.appender.solutionsLog=org.apache.log4j.FileAppender
        log4j.appender.solutionsLog.Threshold=ALL
        log4j.appender.solutionsLog.File=solutions.csv
        log4j.appender.solutionsLog.Append=true
        # I find that it is nicer to have this unbuffered since you can see what
        # is going on and to make sure that I have complete rule evaluation logs
        # on shutdown.
        log4j.appender.solutionsLog.BufferedIO=false
        log4j.appender.solutionsLog.layout=org.apache.log4j.PatternLayout
        log4j.appender.solutionsLog.layout.ConversionPattern=SOLUTION:\t%m
        
        Show
        bryanthompson bryanthompson added a comment - This is not a problem with the ASTComplexOptionalOptimizer. It appears to be an insufficiently correlated solution set hash join. Workaround: use an explicit triple (in the data) to bind the variable rather than the BIND(). This will cause the query generator to use the PipelineJoin for a "simple optional" rather than the "build-hash-index => run-subgroup => hash-join-solutions" pattern. Suspect root cause: the solution set hash join is not sufficiently correlated. Perhaps we need to inject a row identifier into the hash index and the hash join? To see the solutions trace, add the following to your log4j.properties file and uncomment the line as instructed. This will show the solutions flowing out of every operator during query execution. ## # Solutions trace (tab delimited file). Uncomment the next line to enable. #log4j.logger.com.bigdata.bop.engine.SolutionsLog=INFO,solutionsLog log4j.additivity.com.bigdata.bop.engine.SolutionsLog=false log4j.appender.solutionsLog=org.apache.log4j.ConsoleAppender #log4j.appender.solutionsLog=org.apache.log4j.FileAppender log4j.appender.solutionsLog.Threshold=ALL log4j.appender.solutionsLog.File=solutions.csv log4j.appender.solutionsLog.Append=true # I find that it is nicer to have this unbuffered since you can see what # is going on and to make sure that I have complete rule evaluation logs # on shutdown. log4j.appender.solutionsLog.BufferedIO=false log4j.appender.solutionsLog.layout=org.apache.log4j.PatternLayout log4j.appender.solutionsLog.layout.ConversionPattern=SOLUTION:\t%m
        Hide
        bryanthompson bryanthompson added a comment -

        Reassigned to Jeremy. He will rework the case to a minimal example (based on the initial query) so we can get to a firm root cause and a plan for a fix.

        Show
        bryanthompson bryanthompson added a comment - Reassigned to Jeremy. He will rework the case to a minimal example (based on the initial query) so we can get to a firm root cause and a plan for a fix.
        Hide
        michaelschmidt michaelschmidt added a comment -

        Here's a simplified query that fails:

        SELECT *
        WHERE
        { 
          BIND(1 as ?A)
          { BIND(2 as ?B) } UNION { BIND(3 as ?C) }
          OPTIONAL { BIND( 'unbound' as ?D ) }
        } 
        

        -> it contains the solution { ?A->1, ?B->2, ?C->3, D->'unbound'}, event twice

        The problem is probably related to the OPTIONAL clause, since the following query works as expected:

        SELECT *
        WHERE
        { 
          BIND(1 as ?A)
          { BIND(2 as ?B) } UNION { BIND(3 as ?C) }
        } 
        
        Show
        michaelschmidt michaelschmidt added a comment - Here's a simplified query that fails: SELECT * WHERE { BIND(1 as ?A) { BIND(2 as ?B) } UNION { BIND(3 as ?C) } OPTIONAL { BIND( 'unbound' as ?D ) } } -> it contains the solution { ?A->1, ?B->2, ?C->3, D->'unbound'}, event twice The problem is probably related to the OPTIONAL clause, since the following query works as expected: SELECT * WHERE { BIND(1 as ?A) { BIND(2 as ?B) } UNION { BIND(3 as ?C) } }
        Hide
        michaelschmidt michaelschmidt added a comment -

        Resolved in branch ticket_1071. See ticket BLZG-1163 for a description of changes made.

        Show
        michaelschmidt michaelschmidt added a comment - Resolved in branch ticket_1071. See ticket BLZG-1163 for a description of changes made.
        Hide
        michaelschmidt michaelschmidt added a comment -

        CI after merge looks good, closing issue.

        Show
        michaelschmidt michaelschmidt added a comment - CI after merge looks good, closing issue.

          People

          • Assignee:
            michaelschmidt michaelschmidt
            Reporter:
            jeremycarroll jeremycarroll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: