Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-610

Optimize RDF Value materialization performance on cluster.

    Details

      Description

      With the changes in [1], the bottleneck for LUBM Q6 and Q14 on the cluster is now RDF Value materialization. RDF Value materalization is done either by the ChunkedMaterializationOp or the BigdataBindingSetResolverator. The BigdataSolutionResolverator is still used by the rules engine and the BigdataStatementIterator also uses a chunked resolution pattern. However, only the ChunkedMaterializationOp and the BigdataBindingSetResolverator impact SPARQL query. All resolution gets delegated through to LexiconRelation#getTerms(), which calls through to the BatchResolveTermsTask and the BatchResolveBlobsTask.

      The ChunkedMaterializatonOp is used when we need to materialize RDF Values within a query, e.g., for a FILTER. It might also be used to materialize RDF Values for a bridge to a remote SERVICE (SPARQL 1.1 Federated Query support). Like any other operator, the ChunkedMaterializationOp can have multiple concurrent operator tasks based on the parallelism feeding that operator.

      The BigdataBindingSetResolverator is a single threaded producer/consumer pattern. It can only process chunks as fast as the consumer. This is a likely source of the bottleneck on the cluster. The main reason for using the BigdataBindingSetResolverator in a query is that we do not attempt to materialize RDF Values for solutions which are pruned by a SLICE. Doing all materialization within the query plan is very expensive when the query plan uses a SLICE since we materialize far more data than we need.

      There is some opportunity for internal parallelism on the cluster, in terms of the ClientIndexView (MAX_PARALLEL tasks run concurrently when the BatchResolveXXXTask is mapped over the shards).

      There is also some opportunity for parallelism in the BatchResolveXXXTasks themselves. They have a MAX_CHUNK property. If a received chunk size is larger than that, then they will execute multiple requests concurrently.

      While using ChunkedMaterializeOp could improve throughput for both the cluster and the standalone deployments (through increase parallelism to the disk) the default is to NOT do RDF Value materialization using that operator even when no SLICE is present. This bug is documented at AST2BOpContext#materializeProjectionInQuery.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Bug fix to the IVSolutionSetEncoder. There were several edge cases with TermId.mockIV() which were not being handled. This also identified some (as yet unfixed) problems with the pre-existing IBindingSetEncoder/Decoder implementations. Those problems could have implications for HTree hash joins with computed values, since computed values are made manifest as "mock" IVs. See [1].

        This also includes some changes designed to lift control over the chunkCapacity and chunkOfChunksCapacity for RDF Value materialization into the query plan annotations. See [2].

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/475 (Optimize serialization for query messages on cluster)
        [2] https://sourceforge.net/apps/trac/bigdata/ticket/489 (Optimize RDF Value materialization performance on cluster)

        Committed revision r6040.

        Show
        bryanthompson bryanthompson added a comment - Bug fix to the IVSolutionSetEncoder. There were several edge cases with TermId.mockIV() which were not being handled. This also identified some (as yet unfixed) problems with the pre-existing IBindingSetEncoder/Decoder implementations. Those problems could have implications for HTree hash joins with computed values, since computed values are made manifest as "mock" IVs. See [1] . This also includes some changes designed to lift control over the chunkCapacity and chunkOfChunksCapacity for RDF Value materialization into the query plan annotations. See [2] . [1] https://sourceforge.net/apps/trac/bigdata/ticket/475 (Optimize serialization for query messages on cluster) [2] https://sourceforge.net/apps/trac/bigdata/ticket/489 (Optimize RDF Value materialization performance on cluster) Committed revision r6040.
        Hide
        bryanthompson bryanthompson added a comment -

        RDF Value materialization appears to be somewhat slower as measured against U8000 on the mini cluster versus U8000 on a rackspace cluster. Of course, the machines are not really comparable and the there were nearly twice as many (10 versus 6) DS nodes on the rackspace cluster.

        However, I think that there are a few things which can be done to improve matters.

        1. The "chunk size" for the blobs and terms batch resolution tasks should be unlimited on a cluster. The chunks will be partitioned by the shard on which they need to read anyway. We do not want to break them up into smaller chunks first. That will force the RDF Value resolution step to be serialized for each such smaller chunk.

        2. The IBigdataClient.Options.CLIENT_MAX_PARALLEL_TASKS_PER_REQUEST option can limit parallelism when the NSS performs batch resolution of the sharded TermIds and BlobIVs. This was set to 10 in the miniCluster test and was 100 on the rackspace cluster test. Since the resolution step is likely to be scattered widely across the ID2TERM index shards, the limit of 10 could force serialization during RDF value resolution.

        3. There is currently a bug in AST2BOpUtility which prevents us from using the ChunkedMaterializationOp to do materialization for queries which do not use a SLICE. This bug is forcing all solutions to be materialized on the query controller before they can be resolved to RDF Values. If this bug were resolved, the RDF Value resolution step would run as ANY and would enjoy significantly more parallelism.

        Show
        bryanthompson bryanthompson added a comment - RDF Value materialization appears to be somewhat slower as measured against U8000 on the mini cluster versus U8000 on a rackspace cluster. Of course, the machines are not really comparable and the there were nearly twice as many (10 versus 6) DS nodes on the rackspace cluster. However, I think that there are a few things which can be done to improve matters. 1. The "chunk size" for the blobs and terms batch resolution tasks should be unlimited on a cluster. The chunks will be partitioned by the shard on which they need to read anyway. We do not want to break them up into smaller chunks first. That will force the RDF Value resolution step to be serialized for each such smaller chunk. 2. The IBigdataClient.Options.CLIENT_MAX_PARALLEL_TASKS_PER_REQUEST option can limit parallelism when the NSS performs batch resolution of the sharded TermIds and BlobIVs. This was set to 10 in the miniCluster test and was 100 on the rackspace cluster test. Since the resolution step is likely to be scattered widely across the ID2TERM index shards, the limit of 10 could force serialization during RDF value resolution. 3. There is currently a bug in AST2BOpUtility which prevents us from using the ChunkedMaterializationOp to do materialization for queries which do not use a SLICE. This bug is forcing all solutions to be materialized on the query controller before they can be resolved to RDF Values. If this bug were resolved, the RDF Value resolution step would run as ANY and would enjoy significantly more parallelism.
        Hide
        bryanthompson bryanthompson added a comment -

        Another observation with respect to Q6/Q14 for the mini cluster trial describe above: The heavy disk reads on the ID2TERM index tend to be non-overlapping, or only partly overlapping, on the DS nodes. This suggests that we are tending to read in ID2TERM order and that parallelizing the index scan more and having each node which produces bindings for the variable do the materialization step for that variable could significantly improve the RDF Value materialization step for these queries.

        Show
        bryanthompson bryanthompson added a comment - Another observation with respect to Q6/Q14 for the mini cluster trial describe above: The heavy disk reads on the ID2TERM index tend to be non-overlapping, or only partly overlapping, on the DS nodes. This suggests that we are tending to read in ID2TERM order and that parallelizing the index scan more and having each node which produces bindings for the variable do the materialization step for that variable could significantly improve the RDF Value materialization step for these queries.
        Hide
        bryanthompson bryanthompson added a comment -

        Closed. Not relevant to the new architecture.

        Show
        bryanthompson bryanthompson added a comment - Closed. Not relevant to the new architecture.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: