Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-314

TermIdEncoder needs to use more bits now that ITermIdCodes is gone

    Details

      Description

      Now that term types are no longer being encoded into the long term identifiers, we have 2 more bits available for term identifiers.

      See deprecated ITermIdCodes and still in use TermIdEncoder.

        Issue Links

          Activity

          Hide
          bryanthompson bryanthompson added a comment -

          On review, it appears that the TermIdEncoder was stealing the unused sign bits from the partition identifier and the partition local counter. Unless/until bigdata is made 32-bits clean for those values there is no additional opportunity to be had by modifying the TermIdEncoder to reclaim the two lower bits.

          In order to make the partitionId and local counter 32-bits clean, we need to look at the following classes:


          - BTree.PartitionedCounter


          - LocalPartitionMetadata
          - PartitionLocator

          The BTree.PartitionedCounter would have to be 32-bits clean for the lower 32 bits, but it currently wraps at Integer.MAX_VALUE.

          The LocalParitionMetadata and PartitionLocator classes both assume that the sign bit is clear in their serialization logic.

          Show
          bryanthompson bryanthompson added a comment - On review, it appears that the TermIdEncoder was stealing the unused sign bits from the partition identifier and the partition local counter. Unless/until bigdata is made 32-bits clean for those values there is no additional opportunity to be had by modifying the TermIdEncoder to reclaim the two lower bits. In order to make the partitionId and local counter 32-bits clean, we need to look at the following classes: - BTree.PartitionedCounter - LocalPartitionMetadata - PartitionLocator The BTree.PartitionedCounter would have to be 32-bits clean for the lower 32 bits, but it currently wraps at Integer.MAX_VALUE. The LocalParitionMetadata and PartitionLocator classes both assume that the sign bit is clear in their serialization logic.
          Hide
          bryanthompson bryanthompson added a comment -

          In addition to the changes identified above, the sourcePartitionId field uses MINUS ONE (-1) as a special value. However, that field is deprecated and could be removed from the code base.

          Show
          bryanthompson bryanthompson added a comment - In addition to the changes identified above, the sourcePartitionId field uses MINUS ONE (-1) as a special value. However, that field is deprecated and could be removed from the code base.
          Hide
          bryanthompson bryanthompson added a comment -

          LocalPartitionMetadata's history field should be removed at the same time. I found it nearly impossible to read. There are much saner ways to track what is going on in the federation. An analysis of the {@link Event} log is much more useful. If nothing else, you could examine the index partition in the metadata index by scanning the commit points and reading its state in each commit and reporting all state changes.

          Show
          bryanthompson bryanthompson added a comment - LocalPartitionMetadata's history field should be removed at the same time. I found it nearly impossible to read. There are much saner ways to track what is going on in the federation. An analysis of the {@link Event} log is much more useful. If nothing else, you could examine the index partition in the metadata index by scanning the commit points and reading its state in each commit and reporting all state changes.
          Hide
          bryanthompson bryanthompson added a comment -

          Replying to [thompsonbry|comment:4]:
          > In addition to the changes identified above, the sourcePartitionId field uses MINUS ONE (-1) as a special value. However, that field is deprecated and could be removed from the code base.

          sourcePartitionId was used by the original MOVE implementation. It is no longer in use and can simply be removed.

          Show
          bryanthompson bryanthompson added a comment - Replying to [thompsonbry|comment:4] : > In addition to the changes identified above, the sourcePartitionId field uses MINUS ONE (-1) as a special value. However, that field is deprecated and could be removed from the code base. sourcePartitionId was used by the original MOVE implementation. It is no longer in use and can simply be removed.
          Hide
          bryanthompson bryanthompson added a comment -

          I have removed the history field and modified the serialization of the partition locator metadata to be 32-bits clean for the partition identifier. I have not yet removed the sourcePartitionId field nor have I made the partitionId and local counters 32-bits clean or restored the two sign bits to the 64-bit term identifier.

          Show
          bryanthompson bryanthompson added a comment - I have removed the history field and modified the serialization of the partition locator metadata to be 32-bits clean for the partition identifier. I have not yet removed the sourcePartitionId field nor have I made the partitionId and local counters 32-bits clean or restored the two sign bits to the 64-bit term identifier.
          Hide
          bryanthompson bryanthompson added a comment -

          This issue should be closed out when the TERMS_REFACTOR_BRANCH is complete as the TERM2ID index and the ID2TERM index will both go away, as will the issue documented here.

          See https://sourceforge.net/apps/trac/bigdata/ticket/109

          Show
          bryanthompson bryanthompson added a comment - This issue should be closed out when the TERMS_REFACTOR_BRANCH is complete as the TERM2ID index and the ID2TERM index will both go away, as will the issue documented here. See https://sourceforge.net/apps/trac/bigdata/ticket/109
          Hide
          bryanthompson bryanthompson added a comment -

          Per above, this issue will be resolved in a different manner.

          Show
          bryanthompson bryanthompson added a comment - Per above, this issue will be resolved in a different manner.
          Hide
          bryanthompson bryanthompson added a comment -

          This issue is being reopened since the TERM2ID and ID2TERM indices are being reintroduced in the TERMS_REFACTOR_BRANCH while the TERMS index is being restricted to use with larger RDF Values.

          See https://sourceforge.net/apps/trac/bigdata/ticket/342 (TERMS index
          causes too much IO scatter).

          Show
          bryanthompson bryanthompson added a comment - This issue is being reopened since the TERM2ID and ID2TERM indices are being reintroduced in the TERMS_REFACTOR_BRANCH while the TERMS index is being restricted to use with larger RDF Values. See https://sourceforge.net/apps/trac/bigdata/ticket/342 (TERMS index causes too much IO scatter).
          Hide
          bryanthompson bryanthompson added a comment -

          The issue is resolved in both the 1.0.0 maintenance branch and in the current development branch. The changes include:

             - done. Modify TermIdEncoder to use all bits for pid and ctr,
               rejecting only 0L, which is still reserved for null.  Modify the
               unit tests to verify that TermIdEncoder is now 32-bit clean for
               both the pid and ctr values and test encode/decode against random
               non-zero 64bit integers.
          
               Files changed: TermIdEncoder, TestTermIdEncoder
          
             - done. Modify BTree.PartitionedCounter to (a) permit the partition
               local counter to wrap around, but not to come all the way back to
               zero again; and (b) to use a 32-bit clean method to form the
               int64 counter from the partitionId and the local counter value.
          
             - done. Modify BTree.Counter to allow the counter to wrap through
               the negative long integer values until it would reach ZERO (0L)
               again and write unit tests for same.
          
               Files changed: BTree, TestIndexCounter, 
          
             - done. Review the data structures which persist the partition
               identifier to see if they are using packed (long) integers for
               the pid.  That will limit us to Integer.MAX_VALUE shards, which
               is not all that bad. [it looks like things are 32-bit clean for
               pid.]
          

          Committed revision r4863.

          Show
          bryanthompson bryanthompson added a comment - The issue is resolved in both the 1.0.0 maintenance branch and in the current development branch. The changes include: - done. Modify TermIdEncoder to use all bits for pid and ctr, rejecting only 0L, which is still reserved for null. Modify the unit tests to verify that TermIdEncoder is now 32-bit clean for both the pid and ctr values and test encode/decode against random non-zero 64bit integers. Files changed: TermIdEncoder, TestTermIdEncoder - done. Modify BTree.PartitionedCounter to (a) permit the partition local counter to wrap around, but not to come all the way back to zero again; and (b) to use a 32-bit clean method to form the int64 counter from the partitionId and the local counter value. - done. Modify BTree.Counter to allow the counter to wrap through the negative long integer values until it would reach ZERO (0L) again and write unit tests for same. Files changed: BTree, TestIndexCounter, - done. Review the data structures which persist the partition identifier to see if they are using packed (long) integers for the pid. That will limit us to Integer.MAX_VALUE shards, which is not all that bad. [it looks like things are 32-bit clean for pid.] Committed revision r4863.

            People

            • Assignee:
              bryanthompson bryanthompson
              Reporter:
              mikepersonick mikepersonick
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: