Details

      Description

      The cluster does not correctly compute the closure of an RDF data set. Only a very few inferences are produced and the closure is over quite quickly. Data is present in the triple store and can be queried. The rule.log shows appropriate range counts, but the rules do not wind up visiting the data.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Here is a sample U8000 load/closure which illustrates the failure to compute the closure correctly (throughput is low due to heavy swapping, which is a configuration issue).

        namespace=U8000, jobName=U8000, nclients=1
        axiomCount=141
        axiomAndOntologyCount=436
        Master running : class com.bigdata.service.jini.master.FileSystemScanner{acceptCount=0,fileOrDir=/nas/data/lubm/U8000,filter=com.bigdata.rdf.load.RDFFilenameFilter@2a48f675}
        Master accepted 160006 resources for processing.
        Load: tps=44445, ntriples=1069169373, nnew=1069168937, elapsed=24055502ms
        namespace U8000
        class com.bigdata.rdf.store.ScaleOutTripleStore
        indexManager  com.bigdata.service.jini.JiniFederation
        statementCount 1069169373
        termCount 263308594
        uriCount 173979231
        literalCount  89329363
        bnodeCount N/A
        
        Computing closure: now=Mon Oct 03 21:38:39 EDT 2011
        closure: ClosureStats{mutationCount=22, elapsed=10327ms, rate=2}
        Closure: tps=2, ntriples=1069169395, nnew=22, elapsed=10354ms
        namespace U8000
        class com.bigdata.rdf.store.ScaleOutTripleStore
        indexManager  com.bigdata.service.jini.JiniFederation
        statementCount 1069169395
        termCount 263308594
        uriCount 173979231
        literalCount  89329363
        bnodeCount N/A
        
        Net: tps=44396, ntriples=1069169395, nnew=1069168959, elapsed=24082129ms
        commit point: 1317692331610
        Done: Mon Oct 03 21:38:51 EDT 2011
        Shutting down: Mon Oct 03 21:38:51 EDT 2011
        
        Show
        bryanthompson bryanthompson added a comment - Here is a sample U8000 load/closure which illustrates the failure to compute the closure correctly (throughput is low due to heavy swapping, which is a configuration issue). namespace=U8000, jobName=U8000, nclients=1 axiomCount=141 axiomAndOntologyCount=436 Master running : class com.bigdata.service.jini.master.FileSystemScanner{acceptCount=0,fileOrDir=/nas/data/lubm/U8000,filter=com.bigdata.rdf.load.RDFFilenameFilter@2a48f675} Master accepted 160006 resources for processing. Load: tps=44445, ntriples=1069169373, nnew=1069168937, elapsed=24055502ms namespace U8000 class com.bigdata.rdf.store.ScaleOutTripleStore indexManager com.bigdata.service.jini.JiniFederation statementCount 1069169373 termCount 263308594 uriCount 173979231 literalCount 89329363 bnodeCount N/A Computing closure: now=Mon Oct 03 21:38:39 EDT 2011 closure: ClosureStats{mutationCount=22, elapsed=10327ms, rate=2} Closure: tps=2, ntriples=1069169395, nnew=22, elapsed=10354ms namespace U8000 class com.bigdata.rdf.store.ScaleOutTripleStore indexManager com.bigdata.service.jini.JiniFederation statementCount 1069169395 termCount 263308594 uriCount 173979231 literalCount 89329363 bnodeCount N/A Net: tps=44396, ntriples=1069169395, nnew=1069168959, elapsed=24082129ms commit point: 1317692331610 Done: Mon Oct 03 21:38:51 EDT 2011 Shutting down: Mon Oct 03 21:38:51 EDT 2011
        Hide
        bryanthompson bryanthompson added a comment -

        This problem appears to go back to an audit on the closeable iterator patterns. The ThickAsynchronousIterator class has a private transient 'open' field which is initialize to true. However, such initializations only occur when an object is constructed and not when it is deserialized. This bug has been around for a very long time. It became a problem when hasNext() was fixed to test whether or not the iterator was open. The 'open' field is false when the ThickAsynchronousIterator is deserialized. It seems that there was a unit test for this, but the unit test was not integrated into TestAll for that package. The test fails. Now all I have to do is fix it.

        Committed revision r5786.

        Show
        bryanthompson bryanthompson added a comment - This problem appears to go back to an audit on the closeable iterator patterns. The ThickAsynchronousIterator class has a private transient 'open' field which is initialize to true. However, such initializations only occur when an object is constructed and not when it is deserialized. This bug has been around for a very long time. It became a problem when hasNext() was fixed to test whether or not the iterator was open. The 'open' field is false when the ThickAsynchronousIterator is deserialized. It seems that there was a unit test for this, but the unit test was not integrated into TestAll for that package. The test fails. Now all I have to do is fix it. Committed revision r5786.
        Hide
        bryanthompson bryanthompson added a comment -

        That change now reveals a conflict between remote APs (which had been set as the default due to a deadlock observed with sharded joins) and binding the partition identifier on the predicate. I am going to re-enable local APs as the default for scale-out and then debug the sharded AP deadlock and see if the closure is being correctly computed once sharded APs are enabled.

        There is a CI hang related to enabling shared APs that I need to resolve. I will do that under [1]. However, the fix for computing closure is in AccessPath<init>.

                final int partitionId = predicate.getPartitionId();
        
                /*
                 * If the predicate is addressing a specific shard, then the default is
                 * to assume that it will not be using a remote access path. However, if
                 * a remote access path was explicitly request and the partitionId was
                 * specified, then it will be an error (which is trapped below).
                 */
                final boolean remoteAccessPath = predicate
                        .getProperty(
                                IPredicate.Annotations.REMOTE_ACCESS_PATH,
                                partitionId == -1 ? IPredicate.Annotations.DEFAULT_REMOTE_ACCESS_PATH
                                        : false);
        

        That fix is available in Committed Revision r5790.

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/380 (Native SPARQL evaluation on cluster)

        Show
        bryanthompson bryanthompson added a comment - That change now reveals a conflict between remote APs (which had been set as the default due to a deadlock observed with sharded joins) and binding the partition identifier on the predicate. I am going to re-enable local APs as the default for scale-out and then debug the sharded AP deadlock and see if the closure is being correctly computed once sharded APs are enabled. There is a CI hang related to enabling shared APs that I need to resolve. I will do that under [1] . However, the fix for computing closure is in AccessPath<init>. final int partitionId = predicate.getPartitionId(); /* * If the predicate is addressing a specific shard, then the default is * to assume that it will not be using a remote access path. However, if * a remote access path was explicitly request and the partitionId was * specified, then it will be an error (which is trapped below). */ final boolean remoteAccessPath = predicate .getProperty( IPredicate.Annotations.REMOTE_ACCESS_PATH, partitionId == -1 ? IPredicate.Annotations.DEFAULT_REMOTE_ACCESS_PATH : false); That fix is available in Committed Revision r5790. [1] https://sourceforge.net/apps/trac/bigdata/ticket/380 (Native SPARQL evaluation on cluster)
        Hide
        bryanthompson bryanthompson added a comment -

        It turns out that remote APs := true also produces the correct answer for closure as long as the fix described above is applied. I am going to reenable remote APs for CI while I debug shared joins under [1].

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/380 (Native SPARQL evaluation on cluster)

        Show
        bryanthompson bryanthompson added a comment - It turns out that remote APs := true also produces the correct answer for closure as long as the fix described above is applied. I am going to reenable remote APs for CI while I debug shared joins under [1] . [1] https://sourceforge.net/apps/trac/bigdata/ticket/380 (Native SPARQL evaluation on cluster)

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: