Details

      Description

      bigdata-1.1.0.jar contains some classes in org.openrdf namespace (query, rio). If openrdf-sesame-2.5.0-onejar.jar is loaded by the classloader before bigdata jar, then searches either generate class cast errors:

      java.util.concurrent.ExecutionException: java.lang.ClassCastException:
      org.openrdf.query.parser.sparql.ast.ASTIRI cannot be cast
      to org.openrdf.query.parser.sparql.ast.ASTRDFValue
      

      or just return no results.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        The rio classes have been there for a very long time. They are the RDF/XML parser with the extension for statement type (explicit, inferred, or axiom) and the extension for SIDs mode interchange. Perhaps that does not matter as much since they can only impact people in SIDs mode or who are relying on access to the StatementEnum metadata.

        The SPARQL parser classes obviously can impact everyone.

        Sesame makes extensive use of a registry/factory pattern for RDF and SPARQL parsers. We have never fully implemented this pattern, which requires appropriate files in the META_INF directory of the generated JAR. Thus, if we simply rename the RDF XML parser to another package, it will no longer be discovered unless we also add the appropriate META_INF files. However, the META_INF files must also be discovered when we are running against the compiled, but not JAR'd, class files. E.g., under Eclipse.

        Show
        bryanthompson bryanthompson added a comment - The rio classes have been there for a very long time. They are the RDF/XML parser with the extension for statement type (explicit, inferred, or axiom) and the extension for SIDs mode interchange. Perhaps that does not matter as much since they can only impact people in SIDs mode or who are relying on access to the StatementEnum metadata. The SPARQL parser classes obviously can impact everyone. Sesame makes extensive use of a registry/factory pattern for RDF and SPARQL parsers. We have never fully implemented this pattern, which requires appropriate files in the META_INF directory of the generated JAR. Thus, if we simply rename the RDF XML parser to another package, it will no longer be discovered unless we also add the appropriate META_INF files. However, the META_INF files must also be discovered when we are running against the compiled, but not JAR'd, class files. E.g., under Eclipse.
        Hide
        bryanthompson bryanthompson added a comment -

        We have come to a clearly understanding of the service registry pattern based on inspection of the Sesame code base and the java.util.ServiceLoader class. This pattern is sufficient for declaring new service providers, such as the NQuadsParser, but does not provide any mechanism for replacing existing service providers for the same "key". In particular, it appears likely that the service providers are registered in something like their classpath ordering. This means that we can not rely on this META-INF/services pattern to replace the Sesame RDF/XML parser with our own. This implies that we need to intervene in one or more locations and ensure that the Sesame RDF/XML parser is replaced by our RDF/XML parser (which supports SIDs and statement enum interchange). (The alternative is to register our parser under a new MIME type, which would be a different key. However, that leads to problems where our version of the parser is not used before the RDF/XML is not appropriately marked with the extension MIME type.)

        The SPARQL parser does not have these problems because we explicitly create an instance without use of the service registry pattern. Also, the Bigdata2ASTSPARQLParser class does not have a zero argument constructor (it requires a reference to the database) and is therefore not compatible with the service registry pattern.

        I have added a META-INF/services declaration for the NQuadsParser, an NQuadsParserFactory, and a test suite to verify the ability to resolve an NQuadsParserFactory and NQuadsParser using the service provider pattern. I commented out the code in the static initialization of NQuadsParser which was responsible for registering the NQuadsParser's inner parser factory class. I verified that resolution now takes place through the service provider pattern in the IDE when the META-INF directory structure is part of the classpath. The nquads support has been moved into the com.bigdata.rdf.rio.nquads package to follow the general package naming pattern for rio/openrdf.

        The bigdata RDF/XML parser has been moved to the com.bigdata.rdf.rio.rdfxml package and various package private classes have been imported to support the parser since they are no longer visible now that the package namespace has changed. The RDF/XML parser/write classes have also been renamed to begin "Bigdata" so there can no longer be any confusion about which parser/writer is being used. A Bigdata RDF/XML writer factory class has been added. META-INF/services entries have been added for the RDF/XML parser/writer, but as noted above, those entries are NOT sufficient to guarantee resolution of the correct service provider.

        com.bigdata.rdf.ServiceProviderHook has been added. The NQuadsParser#forceLoad() method has been refactored into ServiceProviderHook which is used to enforce certain overrides (replacements) of the default services registered by Sesame. This also ensures that RDFFormats declared by bigdata are registered with openrdf (notably the NQUADS format).

        I modified several places in the bigdata code base where we were directly using the bigdata specific RDFXMLWriter or RDFXMLParser to use the service registry mechanism instead (since the class names were changed, the hard coded references were picking up the wrong class). When those places were in the test suite, I added asserts to verify that the bigdata extension classes were being used.

        The "compile" target in build.xml has been modified to also copy the META-INF/services directory for the bigdata-rdf package. I have verified the "compile" target causes the output META-INF directory to contain all of the desired META-INF/services files.

        The TestLocalTripleStore and TestBigdataSailWithQuads test suites are green.

        The next step is to refactor the SPARQL parser classes in order to avoid similar class path problems arising from using the same package name as openrdf. After than I will open up the SPARQL grammar for changes to support property paths and UPDATE (under different tickets).

        Committed Revision r6045.

        Show
        bryanthompson bryanthompson added a comment - We have come to a clearly understanding of the service registry pattern based on inspection of the Sesame code base and the java.util.ServiceLoader class. This pattern is sufficient for declaring new service providers, such as the NQuadsParser, but does not provide any mechanism for replacing existing service providers for the same "key". In particular, it appears likely that the service providers are registered in something like their classpath ordering. This means that we can not rely on this META-INF/services pattern to replace the Sesame RDF/XML parser with our own. This implies that we need to intervene in one or more locations and ensure that the Sesame RDF/XML parser is replaced by our RDF/XML parser (which supports SIDs and statement enum interchange). (The alternative is to register our parser under a new MIME type, which would be a different key. However, that leads to problems where our version of the parser is not used before the RDF/XML is not appropriately marked with the extension MIME type.) The SPARQL parser does not have these problems because we explicitly create an instance without use of the service registry pattern. Also, the Bigdata2ASTSPARQLParser class does not have a zero argument constructor (it requires a reference to the database) and is therefore not compatible with the service registry pattern. I have added a META-INF/services declaration for the NQuadsParser, an NQuadsParserFactory, and a test suite to verify the ability to resolve an NQuadsParserFactory and NQuadsParser using the service provider pattern. I commented out the code in the static initialization of NQuadsParser which was responsible for registering the NQuadsParser's inner parser factory class. I verified that resolution now takes place through the service provider pattern in the IDE when the META-INF directory structure is part of the classpath. The nquads support has been moved into the com.bigdata.rdf.rio.nquads package to follow the general package naming pattern for rio/openrdf. The bigdata RDF/XML parser has been moved to the com.bigdata.rdf.rio.rdfxml package and various package private classes have been imported to support the parser since they are no longer visible now that the package namespace has changed. The RDF/XML parser/write classes have also been renamed to begin "Bigdata" so there can no longer be any confusion about which parser/writer is being used. A Bigdata RDF/XML writer factory class has been added. META-INF/services entries have been added for the RDF/XML parser/writer, but as noted above, those entries are NOT sufficient to guarantee resolution of the correct service provider. com.bigdata.rdf.ServiceProviderHook has been added. The NQuadsParser#forceLoad() method has been refactored into ServiceProviderHook which is used to enforce certain overrides (replacements) of the default services registered by Sesame. This also ensures that RDFFormats declared by bigdata are registered with openrdf (notably the NQUADS format). I modified several places in the bigdata code base where we were directly using the bigdata specific RDFXMLWriter or RDFXMLParser to use the service registry mechanism instead (since the class names were changed, the hard coded references were picking up the wrong class). When those places were in the test suite, I added asserts to verify that the bigdata extension classes were being used. The "compile" target in build.xml has been modified to also copy the META-INF/services directory for the bigdata-rdf package. I have verified the "compile" target causes the output META-INF directory to contain all of the desired META-INF/services files. The TestLocalTripleStore and TestBigdataSailWithQuads test suites are green. The next step is to refactor the SPARQL parser classes in order to avoid similar class path problems arising from using the same package name as openrdf. After than I will open up the SPARQL grammar for changes to support property paths and UPDATE (under different tickets). Committed Revision r6045.
        Hide
        bryanthompson bryanthompson added a comment -

        CI was clean with the change set above.

        Show
        bryanthompson bryanthompson added a comment - CI was clean with the change set above.
        Hide
        bryanthompson bryanthompson added a comment -

        Work on the class loader problem (this ticket) and on changing over to Sesame 2.6.3.

        https://sourceforge.net/apps/trac/bigdata/ticket/496 (Sesame 2.6.3)

        Downloaded the onejar from sourceforge and generated new JARs from an SVN checkout using:

        mvn -Dmaven.test.skip=true clean install
        

        to generate:

        ./testsuites/sparql/target/sesame-sparql-testsuite-2.5.0.jar
        ./testsuites/store/target/sesame-store-testsuite-2.5.0.jar
        

        Update the classpath for Eclipse and updated the sesame version in build.properties.

        Removed the overridden version of QueryResultUtil from our code base since http://www.openrdf.org/issues/browse/SES-853 (QueryResultUtil fails when solutions have too many bindings.) was fixed in openrdf 2.6.0.

        http://sourceforge.net/apps/trac/bigdata/ticket/439 (Class loader problem)

        Moved all SPARQL grammar files related files from org.openrdf.query.parser.sparql into com.bigdata.rdf.sail.sparql and imported those few files which were not already present (NegatedPropertySet, PropertySet, and UpdateExprBuilder) into the com.bigdata.rdf.sail.sparql package. Those files all represent features which we do not yet support (property paths and SPARQL 1.1 UPDATE). I will review their contents in depth when I work those issues.

        Renamed the org.openrdf.query.parser.sparql.ast package to com.bigdata.rdf.sail.sparql.ast. This ensures that class path problems will not arise. Of course, it makes it more difficult to keep our version synchronized with the openrdf changes....

        Recompiled the SPARQL grammar to fix errors created by the package name change. I needed to change the package name in many of the AST files and also in the sparql.jjt grammar. Quite a PITA.

        TCK: 3 errors / 4 failures (6/8 if we run it as
        TestBigdataSailWithQuads which runs the TCK twice).

        Still broken:

        sparql11-sum-02 : This is failing because openrdf still expects an
        empty solution set rather than a single solution with a ZERO for the
        sum.  This is http://www.openrdf.org/issues/browse/SES-884
        (Aggregation with an solution set as input should produce an empty
        solution as output), which is clearly not fixed yet.  Just checked
        jira. They do not have a fix version for this yet.
        

        Newly broken: None. All pre-existing tests which were passing are still passing.

        Newly fixed:

        sparql11-substr-01
        sparql11-substr-02
        sparql11-substr-03
        sparql11-minus-05
        sparql11-minus-06
        sparql11-minus-07
        sparql11-sum-03
        sparql11-sum-04
        

        The substr tests were known to be broken in openrdf 2.5.0.

        The minus tests also reflected a problem with openrdf 2.5.0. I need to go back and look at them again to recall precisely what was failing.

        There were at least two known problems with aggregation in 2.5.0 (see the links below, only one of which has been fixed. The issue which was fixed has to do with handling of type errors during aggregation and was demonstrated by sparql11-sum-03. See http://www.openrdf.org/issues/browse/SES-862 (Incorrect error handling for SPARQL aggregation; fix in 2.6.1)

        New tests which are broken:

        SPARQL 1.1 BINDINGS 01   (Failing on the BINDINGS clause.)
        SPARQL 1.1 BINDINGS 02   (ditto)
        sparql11-subquery-04     (Port into the AST TestTCK suite and debug there)
        BSBM BI use case query 5 (This appears to be part of the Sesame TCK; fails with named solution set not found error)
        

        Committed revision r6046.

        Show
        bryanthompson bryanthompson added a comment - Work on the class loader problem (this ticket) and on changing over to Sesame 2.6.3. https://sourceforge.net/apps/trac/bigdata/ticket/496 (Sesame 2.6.3) Downloaded the onejar from sourceforge and generated new JARs from an SVN checkout using: mvn -Dmaven.test.skip=true clean install to generate: ./testsuites/sparql/target/sesame-sparql-testsuite-2.5.0.jar ./testsuites/store/target/sesame-store-testsuite-2.5.0.jar Update the classpath for Eclipse and updated the sesame version in build.properties. Removed the overridden version of QueryResultUtil from our code base since http://www.openrdf.org/issues/browse/SES-853 (QueryResultUtil fails when solutions have too many bindings.) was fixed in openrdf 2.6.0. http://sourceforge.net/apps/trac/bigdata/ticket/439 (Class loader problem) Moved all SPARQL grammar files related files from org.openrdf.query.parser.sparql into com.bigdata.rdf.sail.sparql and imported those few files which were not already present (NegatedPropertySet, PropertySet, and UpdateExprBuilder) into the com.bigdata.rdf.sail.sparql package. Those files all represent features which we do not yet support (property paths and SPARQL 1.1 UPDATE). I will review their contents in depth when I work those issues. Renamed the org.openrdf.query.parser.sparql.ast package to com.bigdata.rdf.sail.sparql.ast. This ensures that class path problems will not arise. Of course, it makes it more difficult to keep our version synchronized with the openrdf changes.... Recompiled the SPARQL grammar to fix errors created by the package name change. I needed to change the package name in many of the AST files and also in the sparql.jjt grammar. Quite a PITA. TCK: 3 errors / 4 failures (6/8 if we run it as TestBigdataSailWithQuads which runs the TCK twice). Still broken: sparql11-sum-02 : This is failing because openrdf still expects an empty solution set rather than a single solution with a ZERO for the sum. This is http://www.openrdf.org/issues/browse/SES-884 (Aggregation with an solution set as input should produce an empty solution as output), which is clearly not fixed yet. Just checked jira. They do not have a fix version for this yet. Newly broken: None. All pre-existing tests which were passing are still passing. Newly fixed: sparql11-substr-01 sparql11-substr-02 sparql11-substr-03 sparql11-minus-05 sparql11-minus-06 sparql11-minus-07 sparql11-sum-03 sparql11-sum-04 The substr tests were known to be broken in openrdf 2.5.0. The minus tests also reflected a problem with openrdf 2.5.0. I need to go back and look at them again to recall precisely what was failing. There were at least two known problems with aggregation in 2.5.0 (see the links below, only one of which has been fixed. The issue which was fixed has to do with handling of type errors during aggregation and was demonstrated by sparql11-sum-03. See http://www.openrdf.org/issues/browse/SES-862 (Incorrect error handling for SPARQL aggregation; fix in 2.6.1) New tests which are broken: SPARQL 1.1 BINDINGS 01 (Failing on the BINDINGS clause.) SPARQL 1.1 BINDINGS 02 (ditto) sparql11-subquery-04 (Port into the AST TestTCK suite and debug there) BSBM BI use case query 5 (This appears to be part of the Sesame TCK; fails with named solution set not found error) Committed revision r6046.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: