Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-1989

Tinkerpop build issue (triggered by new ChunkedMaterialization code path?)

    Details

    • Type: Bug
    • Status: Done
    • Priority: Highest
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: BLAZEGRAPH_2_2_0
    • Component/s: blueprints
    • Labels:
      None

      Description

      There is an issue with the tinkerpop build in master which appears to date back two weeks. See [1]. For some reason, the pace of builds for the tinkerpop module is quite low. So this is only the 2nd build in that period of time.

      The root cause likely goes back to changes made to improve parallelism of the materialization of RDF values out of a query. The stack trace goes through the ChunkedMaterializationOperator, which is a new code path for tinkerpop.

      Caused by: java.lang.UnsupportedOperationException
      	at java.util.AbstractCollection.add(AbstractCollection.java:262)
      	at com.bigdata.rdf.lexicon.LexiconRelation.getTerms(LexiconRelation.java:2778)
      	at com.bigdata.rdf.lexicon.LexiconRelation.getTerms(LexiconRelation.java:2411)
      	at com.bigdata.bop.rdf.join.ChunkedMaterializationOp.resolveChunk(ChunkedMaterializationOp.java:396)
      	at com.bigdata.bop.rdf.join.ChunkedMaterializationOp$ChunkTask.call(ChunkedMaterializationOp.java:246)
      	at com.bigdata.bop.rdf.join.ChunkedMaterializationOp$ChunkTask.call(ChunkedMaterializationOp.java:193)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      

      [1] https://ci.blazegraph.com/view/Tinkerpop3/job/tinkerpop3/178/console

        Issue Links

          Activity

          Hide
          bryanthompson bryanthompson added a comment -

          In commit 80cfdf05377c8ab425f3f3d13fb5e9f75fc913c4 Brad Bebee merged https://github.com/SYSTAP/bigdata/pull/310 into 2.0.1 for BLZG-1591. This change included the following edit to ChunkedMaterializationOp:

          Old:

                  // batch resolve term identifiers to terms.
                  final Map<IV<?, ?>, BigdataValue> terms = lex.getTerms(ids);
          

          New:

                  // batch resolve term identifiers to terms; as a side-effect, this sets the cache
                  // on the IVs that we pass in
                  final Map<IV<?, ?>, BigdataValue> terms = lex.getTerms(idToConstMap.keySet());
          

          The exception is being thrown out of AbstractCollection.add() on the *final Collection<IV<?, ?>> ivs* passed into LexiconRelation.getTerms(). It appears that the exception is caused by an attempt to mutation (insert) an IV into an immutable collection.

          Show
          bryanthompson bryanthompson added a comment - In commit 80cfdf05377c8ab425f3f3d13fb5e9f75fc913c4 Brad Bebee merged https://github.com/SYSTAP/bigdata/pull/310 into 2.0.1 for BLZG-1591 . This change included the following edit to ChunkedMaterializationOp: Old: // batch resolve term identifiers to terms. final Map<IV<?, ?>, BigdataValue> terms = lex.getTerms(ids); New: // batch resolve term identifiers to terms; as a side-effect, this sets the cache // on the IVs that we pass in final Map<IV<?, ?>, BigdataValue> terms = lex.getTerms(idToConstMap.keySet()); The exception is being thrown out of AbstractCollection.add() on the * final Collection<IV<?, ?>> ivs * passed into LexiconRelation.getTerms(). It appears that the exception is caused by an attempt to mutation (insert) an IV into an immutable collection.
          Hide
          bryanthompson bryanthompson added a comment -

          Adding stasmalyshev since BLZG-1591 was related to the WDS.

          Show
          bryanthompson bryanthompson added a comment - Adding stasmalyshev since BLZG-1591 was related to the WDS.
          Hide
          bryanthompson bryanthompson added a comment -

          The specific commit which introduced the change in the ChunkedMaterializationOp was afbd55a8d8ffcf4a939a069c17a66c50c007c6e9 by michaelschmidt. See https://github.com/SYSTAP/bigdata/pull/310/commits/afbd55a8d8ffcf4a939a069c17a66c50c007c6e9.

          michaelschmidt The issue appears to be that getTerms() assumes that it can add additional terms into the collection of IVs in order to perform SID resolution. Your commit passes in an immutable collection (keySet()). The tinkerpop code then relies on the SID IV materialization and hits this UnsupportedOperationException.

          Show
          bryanthompson bryanthompson added a comment - The specific commit which introduced the change in the ChunkedMaterializationOp was afbd55a8d8ffcf4a939a069c17a66c50c007c6e9 by michaelschmidt . See https://github.com/SYSTAP/bigdata/pull/310/commits/afbd55a8d8ffcf4a939a069c17a66c50c007c6e9 . michaelschmidt The issue appears to be that getTerms() assumes that it can add additional terms into the collection of IVs in order to perform SID resolution. Your commit passes in an immutable collection (keySet()). The tinkerpop code then relies on the SID IV materialization and hits this UnsupportedOperationException.
          Hide
          bryanthompson bryanthompson added a comment -

          Assigning to michaelschmidt to further analyze the problem. I suspect that we need to pass in a mutable map. We could avoid the UnsupportedOperationException by adding the caller's IVs into a mutable collection, but this would not have the (apparently expected) behavior of inserting the unrequested terms from SIDs per the code block below. It might be that the caller is not actually expecting to receive that additional information - mikepersonick can you comment on this? If so, then the temporary mutable collection would solve the problem. If we do need to get the additional information out (and not just indirectly through the full materialization of the SidIV) then my question would be why the calling code was refactored to pass in an immutable collection. Note that the ChunkedMaterializationOp is now heavily used for parallelized materialization of query solutions. Historically this was handled by a single-threaded iterator pattern.

                  /*
                   * We need to materialize terms inside of SIDs so that the SIDs
                   * can be materialized properly.
                   */
                  for (IV<?,?> iv : ivs) {
                      
                      if (iv instanceof SidIV) {
          
                      	handleSid((SidIV) iv, ivs, unrequestedSidTerms);
                      	
                      }
                      
                  }
                  
                  /*
                   * Add the SID terms to the IVs to materialize.
                   */
                  for (IV<?, ?> iv : unrequestedSidTerms) {
                  	
                  	ivs.add(iv);
                  	
                  }
          
          Show
          bryanthompson bryanthompson added a comment - Assigning to michaelschmidt to further analyze the problem. I suspect that we need to pass in a mutable map. We could avoid the UnsupportedOperationException by adding the caller's IVs into a mutable collection, but this would not have the (apparently expected) behavior of inserting the unrequested terms from SIDs per the code block below. It might be that the caller is not actually expecting to receive that additional information - mikepersonick can you comment on this? If so, then the temporary mutable collection would solve the problem. If we do need to get the additional information out (and not just indirectly through the full materialization of the SidIV) then my question would be why the calling code was refactored to pass in an immutable collection. Note that the ChunkedMaterializationOp is now heavily used for parallelized materialization of query solutions. Historically this was handled by a single-threaded iterator pattern. /* * We need to materialize terms inside of SIDs so that the SIDs * can be materialized properly. */ for (IV<?,?> iv : ivs) { if (iv instanceof SidIV) { handleSid((SidIV) iv, ivs, unrequestedSidTerms); } } /* * Add the SID terms to the IVs to materialize. */ for (IV<?, ?> iv : unrequestedSidTerms) { ivs.add(iv); }
          Hide
          bryanthompson bryanthompson added a comment -

          Brad Bebee Modified https://ci.blazegraph.com/view/Tinkerpop3/job/tinkerpop3/configure to build when master for blazegraph is built. This is why the tinkerpop3 project was not building on a timely basis. mikepersonick

          Show
          bryanthompson bryanthompson added a comment - Brad Bebee Modified https://ci.blazegraph.com/view/Tinkerpop3/job/tinkerpop3/configure to build when master for blazegraph is built. This is why the tinkerpop3 project was not building on a timely basis. mikepersonick
          Hide
          michaelschmidt michaelschmidt added a comment -

          Proposed fix in PR https://github.com/SYSTAP/bigdata/pull/443, running through CI now. Fix is to operate on modifiable copy of the map.

          Show
          michaelschmidt michaelschmidt added a comment - Proposed fix in PR https://github.com/SYSTAP/bigdata/pull/443 , running through CI now. Fix is to operate on modifiable copy of the map.
          Hide
          bryanthompson bryanthompson added a comment -
          Show
          bryanthompson bryanthompson added a comment - Resolved by https://github.com/SYSTAP/bigdata/pull/443

            People

            • Assignee:
              michaelschmidt michaelschmidt
              Reporter:
              bryanthompson bryanthompson
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: