Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-303

JDK collator results in decode errors for the SparseRowStore

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Affects Version/s: TERMS_REFACTOR_BRANCH
    • Fix Version/s: None
    • Component/s: Other

      Description

      Fred:
      There seems to be a bug in that KeyDecoder objects cannot decode BTree keys where the key was built with the JDK collator. That is, the JDK CollationKey contains bytes of zero, which the KeyDecoder falsely assumes separates sections of the BTree key. What is the best way to fix this?

      Bryan:
      > Can you add a unit test which demonstrates this failure.? This can go in com.bigdata.btree.keys.TestKeyBuilder.? I can then take a look at what is going on with the JDK collation and see if there is something to be done.

      Fred:
      >> Done. Svn #3202. I assume that failures will start showing up in the CI tests.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Fred provided a stack trace, which I have excerpted below. This is during the triple store create, so it should be easy enough to test out. I'll start as you did by throwing an exception out of ICUSortKeyGenerator and making sure that it is not being used. I can then do a triple store create when the triple store is configured for the JDK collator option and see what I find.

        Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
        at com.bigdata.sparse.KeyDecoder.<init>(KeyDecoder.java:283)
        at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:260)
        at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:158)
        at com.bigdata.sparse.AtomicRowWriteRead.apply(AtomicRowWriteRead.java:161)
        at com.bigdata.sparse.AtomicRowWriteRead.apply(AtomicRowWriteRead.java:23)
        at com.bigdata.journal.IndexProcedureTask.doTask(IndexProcedureTask.java:56)
        at com.bigdata.journal.AbstractTask$InnerWriteServiceCallable.call(AbstractTask.java:2038)
        at com.bigdata.journal.AbstractTask.doUnisolatedReadWriteTask(AbstractTask.java:1799)
        at com.bigdata.journal.AbstractTask.call2(AbstractTask.java:1726)
        at com.bigdata.journal.AbstractTask.call(AbstractTask.java:1592)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at com.bigdata.concurrent.NonBlockingLockManagerWithNewDesign$LockFutureTask.run(NonBlockingLockManagerWithNewDesign.java:1970)
        ... 3 more
        Shutting down: Fri Jul 30 16:16:04 EDT 2010
        BUILD SUCCESSFUL (total time: 11 seconds)

        Show
        bryanthompson bryanthompson added a comment - Fred provided a stack trace, which I have excerpted below. This is during the triple store create, so it should be easy enough to test out. I'll start as you did by throwing an exception out of ICUSortKeyGenerator and making sure that it is not being used. I can then do a triple store create when the triple store is configured for the JDK collator option and see what I find. Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at com.bigdata.sparse.KeyDecoder.<init>(KeyDecoder.java:283) at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:260) at com.bigdata.sparse.AbstractAtomicRowReadOrWrite.atomicRead(AbstractAtomicRowReadOrWrite.java:158) at com.bigdata.sparse.AtomicRowWriteRead.apply(AtomicRowWriteRead.java:161) at com.bigdata.sparse.AtomicRowWriteRead.apply(AtomicRowWriteRead.java:23) at com.bigdata.journal.IndexProcedureTask.doTask(IndexProcedureTask.java:56) at com.bigdata.journal.AbstractTask$InnerWriteServiceCallable.call(AbstractTask.java:2038) at com.bigdata.journal.AbstractTask.doUnisolatedReadWriteTask(AbstractTask.java:1799) at com.bigdata.journal.AbstractTask.call2(AbstractTask.java:1726) at com.bigdata.journal.AbstractTask.call(AbstractTask.java:1592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at com.bigdata.concurrent.NonBlockingLockManagerWithNewDesign$LockFutureTask.run(NonBlockingLockManagerWithNewDesign.java:1970) ... 3 more Shutting down: Fri Jul 30 16:16:04 EDT 2010 BUILD SUCCESSFUL (total time: 11 seconds)
        Hide
        bryanthompson bryanthompson added a comment -

        Bug fix to DefaultKeyBuilderFactory, which was passing the default into properties.getProperty(key,def) and therefore was never testing System.getProperty(key,def). This was causing the ICU collator to always be chosen by this code path even when the collator had been explicitly set as a JVM property.

        Added final in a bunch of places to ICUSortKeyGenerator.

        Removed logic in the GlobalRowStoreHelper which was forcing the ASCII collator when the JDK collator had been requested due to a historical problem with the JDK collator and the global row store.

        In order to use the JDK collator you must now turn on unicode clean support in the SparseRowStore class. Since this breaks binary compatibility (for both the JDK and ICU collator options), I plan to do this as part of our next release.

        Improved the inline comments in KeyDecoder regarding the ArrayIndexOutOfBoundsException.

        To verify that this fixes the JDK collator issue: (1) modify the ICUSortKeyGenerator to always throw an exception out of its constructor; (2) specify "-Dcom.bigdata.btree.keys.KeyBuilder.collator=JDK" on the command line to select the JDK collator; (3) specify "-Dcom.bigdata.sparse.Schema.schemaName.unicodeClean=true" on the command line; and (4) run any unit test which create a triple store but does not explicitly specify the CollatorEnum option.

        Show
        bryanthompson bryanthompson added a comment - Bug fix to DefaultKeyBuilderFactory, which was passing the default into properties.getProperty(key,def) and therefore was never testing System.getProperty(key,def). This was causing the ICU collator to always be chosen by this code path even when the collator had been explicitly set as a JVM property. Added final in a bunch of places to ICUSortKeyGenerator. Removed logic in the GlobalRowStoreHelper which was forcing the ASCII collator when the JDK collator had been requested due to a historical problem with the JDK collator and the global row store. In order to use the JDK collator you must now turn on unicode clean support in the SparseRowStore class. Since this breaks binary compatibility (for both the JDK and ICU collator options), I plan to do this as part of our next release. Improved the inline comments in KeyDecoder regarding the ArrayIndexOutOfBoundsException. To verify that this fixes the JDK collator issue: (1) modify the ICUSortKeyGenerator to always throw an exception out of its constructor; (2) specify "-Dcom.bigdata.btree.keys.KeyBuilder.collator=JDK" on the command line to select the JDK collator; (3) specify "-Dcom.bigdata.sparse.Schema.schemaName.unicodeClean=true" on the command line; and (4) run any unit test which create a triple store but does not explicitly specify the CollatorEnum option.
        Hide
        bryanthompson bryanthompson added a comment -

        Before closing out this issue, extend the IndexMetadata for the SparseRowStore instances to specify whether they are unicode clean or not. Existing store instances should read and then update their SparseRowStore IndexMetadata objects to make these configuration decisions restart safe.

        See http://sourceforge.net/apps/trac/bigdata/ticket/171

        Show
        bryanthompson bryanthompson added a comment - Before closing out this issue, extend the IndexMetadata for the SparseRowStore instances to specify whether they are unicode clean or not. Existing store instances should read and then update their SparseRowStore IndexMetadata objects to make these configuration decisions restart safe. See http://sourceforge.net/apps/trac/bigdata/ticket/171
        Hide
        bryanthompson bryanthompson added a comment -

        This issue slipped on the 1.0 release. I've moved it to the TERMS_REFACTOR_RELEASE.

        Show
        bryanthompson bryanthompson added a comment - This issue slipped on the 1.0 release. I've moved it to the TERMS_REFACTOR_RELEASE.
        Hide
        bryanthompson bryanthompson added a comment -

        Since we need to issue a dot release for [1], I am going to fold this issue into that dot release as well. The change of to Unicode clean for the schema name is now present in both the maintenance branch for 1.0.0 and the development branch.

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder imposes limit of 2B RDF Values on Journal)

        Show
        bryanthompson bryanthompson added a comment - Since we need to issue a dot release for [1] , I am going to fold this issue into that dot release as well. The change of to Unicode clean for the schema name is now present in both the maintenance branch for 1.0.0 and the development branch. [1] https://sourceforge.net/apps/trac/bigdata/ticket/349 (TermIdEncoder imposes limit of 2B RDF Values on Journal)

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: