With the changes in , the bottleneck for LUBM Q6 and Q14 on the cluster is now RDF Value materialization. RDF Value materalization is done either by the ChunkedMaterializationOp or the BigdataBindingSetResolverator. The BigdataSolutionResolverator is still used by the rules engine and the BigdataStatementIterator also uses a chunked resolution pattern. However, only the ChunkedMaterializationOp and the BigdataBindingSetResolverator impact SPARQL query. All resolution gets delegated through to LexiconRelation#getTerms(), which calls through to the BatchResolveTermsTask and the BatchResolveBlobsTask.
The ChunkedMaterializatonOp is used when we need to materialize RDF Values within a query, e.g., for a FILTER. It might also be used to materialize RDF Values for a bridge to a remote SERVICE (SPARQL 1.1 Federated Query support). Like any other operator, the ChunkedMaterializationOp can have multiple concurrent operator tasks based on the parallelism feeding that operator.
The BigdataBindingSetResolverator is a single threaded producer/consumer pattern. It can only process chunks as fast as the consumer. This is a likely source of the bottleneck on the cluster. The main reason for using the BigdataBindingSetResolverator in a query is that we do not attempt to materialize RDF Values for solutions which are pruned by a SLICE. Doing all materialization within the query plan is very expensive when the query plan uses a SLICE since we materialize far more data than we need.
There is some opportunity for internal parallelism on the cluster, in terms of the ClientIndexView (MAX_PARALLEL tasks run concurrently when the BatchResolveXXXTask is mapped over the shards).
There is also some opportunity for parallelism in the BatchResolveXXXTasks themselves. They have a MAX_CHUNK property. If a received chunk size is larger than that, then they will execute multiple requests concurrently.
While using ChunkedMaterializeOp could improve throughput for both the cluster and the standalone deployments (through increase parallelism to the disk) the default is to NOT do RDF Value materialization using that operator even when no SLICE is present. This bug is documented at AST2BOpContext#materializeProjectionInQuery.