Add support for hash partitioned access paths on the cluster. This will make it possible to use a hash partitioned distribution for the indices, which in turn supports the use case of an offline map/reduce bulk loader.

      The order of the tuples within each index shard will still be maintained, but tuples in a hash partitioned index must be read from ALL nodes. High level query does not rely on the index order (it already uses parallel shard scans when an access path spans a shard boundary).

      To support hash partitioned operations (including hash partitioned hash tables), we should flood the query to all nodes. That could be done by UDP with reach back from the node as a fallback if it misses the UDP message. Or we could use a fork/join pool to send out the query message with maximum parallelism.

      We will also have to broadcast the run state changes (startOp) to the nodes for hash partitioned access paths. Since reliable messaging is necessary in order for the query to terminate, this would have to use TCP but a fork/join pool could be used to reduce CPU load and latency associated with those messages.

      Note that fully bound access paths can be tracked to a single node and should be executed solely on that node.

      The partitioning can be done either in an RDF "aware" manner or using the blind hash code of the access path. YARS2 uses the following hash partitioning scheme:

      For distribution across machines we use a static hash function on
      the first element of each quad, which works fine for most indices
      except the one starting with P (POCS); for that one we hash PO, and
      do flooding for (?, p, ?, ?) queries.


        bryanthompson bryanthompson added a comment -

        Closed. Not relevant to the new architecture.

        bryanthompson bryanthompson added a comment - Closed. Not relevant to the new architecture.


          • Assignee:
            bryanthompson bryanthompson
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            2 Start watching this issue


            • Created: