Add support for hash partitioned access paths on the cluster. This will make it possible to use a hash partitioned distribution for the indices, which in turn supports the use case of an offline map/reduce bulk loader.
The order of the tuples within each index shard will still be maintained, but tuples in a hash partitioned index must be read from ALL nodes. High level query does not rely on the index order (it already uses parallel shard scans when an access path spans a shard boundary).
To support hash partitioned operations (including hash partitioned hash tables), we should flood the query to all nodes. That could be done by UDP with reach back from the node as a fallback if it misses the UDP message. Or we could use a fork/join pool to send out the query message with maximum parallelism.
We will also have to broadcast the run state changes (startOp) to the nodes for hash partitioned access paths. Since reliable messaging is necessary in order for the query to terminate, this would have to use TCP but a fork/join pool could be used to reduce CPU load and latency associated with those messages.
Note that fully bound access paths can be tracked to a single node and should be executed solely on that node.
The partitioning can be done either in an RDF "aware" manner or using the blind hash code of the access path. YARS2 uses the following hash partitioning scheme: