Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-5

Replace DataSetJoin with an "inline" access path.

    Details

    • Type: Task
    • Status: In Progress
    • Resolution: Unresolved
    • Affects Version/s: TERMS_REFACTOR_BRANCH
    • Fix Version/s: None
    • Component/s: Bigdata RDF Database
    • Labels:
      None

      Description

      The DataSetJoin is a custom operator used to join the URI IVs for a named graph or default graph access path against the source binding sets before feeding them to a join against the next access path. This is only done in cases where it is more efficient that scanning an index and filtering on the context position using an IN filter.

      DataSetJoin should be replaced by a standard (aka pipeline) join and an inline access path. The inline access path will be a simple relation whose data are the IVs.

      This change will make it possible to apply the runtime query optimizer to quads access paths.

      @see https://sourceforge.net/apps/trac/bigdata/ticket/209 (AccessPath should visit binding sets rather than elements when used for high level query.)

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Raised priority. This will be addressed shortly.

        First, Matt will hack in the data set join to convertJoinGroup().

        The inline access path needs to be an IPredicate over an inline
        IAccessPath so we can get the join ordering right.

        This provices a fast general purpose mechanism to constrain
        subsequent joins, but it has to be needs to be an IAccessPath so
        we can get the range count of the inline AP in order to get the
        join ordering right when using the static join optimizer.

        For IN, this is as simple as an IV[] and we could use use the
        INHashBop, tacking right onto the IPredicate.

        For the RTO, the inline AP should be based on the HTree for
        scalability as the #of samples could be large and should not
        burden the JVM heap.

        For a cluster, we need to be able to materialize chunk messages
        from a "remote" inline AP, or even demand the whole thing be
        materialized on a node.

        Show
        bryanthompson bryanthompson added a comment - Raised priority. This will be addressed shortly. First, Matt will hack in the data set join to convertJoinGroup(). The inline access path needs to be an IPredicate over an inline IAccessPath so we can get the join ordering right. This provices a fast general purpose mechanism to constrain subsequent joins, but it has to be needs to be an IAccessPath so we can get the range count of the inline AP in order to get the join ordering right when using the static join optimizer. For IN, this is as simple as an IV[] and we could use use the INHashBop, tacking right onto the IPredicate. For the RTO, the inline AP should be based on the HTree for scalability as the #of samples could be large and should not burden the JVM heap. For a cluster, we need to be able to materialize chunk messages from a "remote" inline AP, or even demand the whole thing be materialized on a node.
        Hide
        bryanthompson bryanthompson added a comment -

        Reduced priority. This issue is no longer on the critical path.

        Show
        bryanthompson bryanthompson added a comment - Reduced priority. This issue is no longer on the critical path.
        Hide
        bryanthompson bryanthompson added a comment -

        Note: This can probably be handled using the new mechanisms for operations against hash indices, HTree, Streams, etc.

        Show
        bryanthompson bryanthompson added a comment - Note: This can probably be handled using the new mechanisms for operations against hash indices, HTree, Streams, etc.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated: