Details

      Description

      The SPARQL parser recognizes and extracts this information and attaches it to the AST, but the query plan generator does not yet use the attached bindings. The simplest way to integrate those BINDINGS is to pump them into a named solution set and then INCLUDE that solution set within the top-level WHERE clause.

      However, bigdata accepts IBindingSet[]s pretty much everywhere. In fact, they are accepted at the IRunningQuery, the ASTOptimizers, etc. Just not ASTEvalHelper. The more general solution is therefore to simply accept multiple solutions into the query, do the static analysis of those solutions (the SolutionStats class) and then attach that analysis to the QueryRoot. The SolutionStats can then be leveraged in the ASTOptimizers.

      The openrdf platform allows the caller to specify a single BindingSet as input to the query. That pre-existing API needs to be reconciled with a BindingSet[] flowing into the query through the Bindings clause. I think that the right way to reconcile these things is to treat the caller given BindingSet as a constraint which must be applied to every solution in the BINDINGS clause. If this results in conflicting bindings for a given source solution from the BINDINGS clause, then there are no solutions to the query for that source solution.

      Thus, another way to look at this is that the BINDINGS clause attached to the QueryRoot replaces the BindingSet or IBindingSet flowing into the query. They are basically different approaches to capturing the same information and just need to be reconciled.

      See https://sourceforge.net/apps/trac/bigdata/ticket/412 (StaticAnalysis#getDefinitelyBound() ignores exogenous variables)
      See https://sourceforge.net/apps/trac/bigdata/ticket/449 (SPARQL 1.1 Federation)
      See https://sourceforge.net/apps/trac/bigdata/ticket/267 (Support evaluation of 3rd party operators)

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        The BINDINGS clause is now obeyed. More work on federated query support. The remaining issue is deferring SERVICE calls where the service reference is a variable.

        The static analysis of exogenous variables issue (https://sourceforge.net/apps/trac/bigdata/ticket/412) remains open. I have not investigated optimizations there yet.


        - Renamed some methods on the IServiceOptions interface and some

        implementations of that interface in order to reduce confusion

        between internal bigdata versus internal openrdf services versus

        remote SPARQL services.


        - Modified the named solution set operators (JVMNamedSubqueryOp and

        HTreeNamedSubqueryOp) to vector all source solutions into the named

        subquery. These operators now verify that they are configured for

        "at-once" evaluation, thus ensuring that any BINDINGS clause is

        fully passed through into the named subquery by the operator.


        - Modified ASTEvalHelper and AST2BOpUtility to process the BINDINGS

        clause. If the openrdf API specifies a non-empty BindingSet and a

        BINDINGS clause was also given, then we do a simple JOIN of those

        solutions. This is always a [x N|1] join and will have at most N

        solutions. Solutions which do not join are dropped. They represent

        a conflict between the openrdf API given bindings and the BINDINGS

        clause. The remaining solutions are vectored into the query. The

        various tests which rely on the BINDINGS clause now pass.

        Note: I have not yet revisited the AST optimizations for the

        exogenous variables.


        - Added AT_ONCE query hint. This indicates that the corresponding

        operator will be marked as !pipelined. All source solutions for

        that operator will be buffered before the operator is evaluated and

        the operator will be evaluated exactly once. Added unit tests for

        this operator for PipelineJoin and ServiceCallJoin.


        - Added CHUNK_SIZE query hint. This is just a well known name for the

        BufferAnnotations.CHUNK_CAPACITY and duplicates the existing

        ChunkCapacityQueryHint, making it more convienent to override the

        vector size for an operator.

        Added unit tests to verify that CHUNK_SIZE is correctly applied to

        PipelineJoin and ServiceCallJoin.


        - ASTServiceNodeOptimizer has been modified at least temporarily to

        NOT lift out SERVICE calls into a named subquery unless the SERVICE

        reference is a constant which is the bigdata internal search

        service. I want to think about more general purpose ways of

        handling this. E.g., by registering a service as "runOnce".

        However, it may be that the most general way to handle this is to

        specify the service as "at-once" (which is in fact the default for a

        Service).

        See https://sourceforge.net/apps/trac/bigdata/ticket/449 (SPARQL 1.1 Federated Query)

        Show
        bryanthompson bryanthompson added a comment - The BINDINGS clause is now obeyed. More work on federated query support. The remaining issue is deferring SERVICE calls where the service reference is a variable. The static analysis of exogenous variables issue ( https://sourceforge.net/apps/trac/bigdata/ticket/412 ) remains open. I have not investigated optimizations there yet. - Renamed some methods on the IServiceOptions interface and some implementations of that interface in order to reduce confusion between internal bigdata versus internal openrdf services versus remote SPARQL services. - Modified the named solution set operators (JVMNamedSubqueryOp and HTreeNamedSubqueryOp) to vector all source solutions into the named subquery. These operators now verify that they are configured for "at-once" evaluation, thus ensuring that any BINDINGS clause is fully passed through into the named subquery by the operator. - Modified ASTEvalHelper and AST2BOpUtility to process the BINDINGS clause. If the openrdf API specifies a non-empty BindingSet and a BINDINGS clause was also given, then we do a simple JOIN of those solutions. This is always a [x N|1] join and will have at most N solutions. Solutions which do not join are dropped. They represent a conflict between the openrdf API given bindings and the BINDINGS clause. The remaining solutions are vectored into the query. The various tests which rely on the BINDINGS clause now pass. Note: I have not yet revisited the AST optimizations for the exogenous variables. - Added AT_ONCE query hint. This indicates that the corresponding operator will be marked as !pipelined. All source solutions for that operator will be buffered before the operator is evaluated and the operator will be evaluated exactly once. Added unit tests for this operator for PipelineJoin and ServiceCallJoin. - Added CHUNK_SIZE query hint. This is just a well known name for the BufferAnnotations.CHUNK_CAPACITY and duplicates the existing ChunkCapacityQueryHint, making it more convienent to override the vector size for an operator. Added unit tests to verify that CHUNK_SIZE is correctly applied to PipelineJoin and ServiceCallJoin. - ASTServiceNodeOptimizer has been modified at least temporarily to NOT lift out SERVICE calls into a named subquery unless the SERVICE reference is a constant which is the bigdata internal search service. I want to think about more general purpose ways of handling this. E.g., by registering a service as "runOnce". However, it may be that the most general way to handle this is to specify the service as "at-once" (which is in fact the default for a Service). See https://sourceforge.net/apps/trac/bigdata/ticket/449 (SPARQL 1.1 Federated Query)
        Hide
        bryanthompson bryanthompson added a comment -

        Committed revision r6080. (for the above comment)

        Show
        bryanthompson bryanthompson added a comment - Committed revision r6080. (for the above comment)

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: