Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-338

Adler32 checksums require multiple copies between the C and Java heaps

    Details

    • Type: Bug
    • Status: Open
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Other
    • Labels:
      None

      Description

      Both HA and distributed query evaluation rely on Alder32 checksums. These checksums are used to verify correct transfer of data over the wire and on read back from disk.

      The Java Adler32 class is a thin JNI wrapper for the zlib library, which is written in C. However, the Adler32 API accepts Java byte[] objects. These byte[]s are then copied (or pinned) for access from the C heap where the running checksum is maintained.

      Both HA and distributed query evaluation make extensive use of Java "direct" ByteBuffer objects. For a direct ByteBuffer, the data are stored on the C heap, while the Java object provides a thin JNI wrapper. This approach is central to the efficiency of Java NIO operations, including direct (zero copy) transfer between channels which is critical for high throughput services.

      Since we are computing the Adler32 checksum of a direct ByteBuffer, the data are actually being transferred in chunks from the C heap into a Java heap byte[]. From Java, the data are then transferred back to the C heap through the Alder32 API. Finally, the checksum is computed by the zlib code.

      This triple copy is a ridiculous waste of resources. A JNI method should be defined to compute the Adler32 checksum directly using the existing zlib code on the data already on the C heap.

      The BytesUtil.java class already contains JNI methods for the rare cases where JNI evaluation of a method can offer substantial benefit. Those methods are implemented by the BytesUtil.c class. The JNI methods are used IFF the corresponding shared library is discovered when the JVM loads the BytesUtil.java class. If the shared library is not found, then a 100% Java version of the method is executed instead. This provides a low cost ability to leverage JNI code without sacrificing portability.

      A JNI method should be added to compute the Adler32 checksum of a direct ByteBuffer (one whose backing storage was allocated using ByteBuffer.allocateDirect() on the C heap).

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        It is interesting to note that the Inflater / Delater API compute the Adler 32 checksum as a side effect. See https://sourceforge.net/apps/trac/bigdata/ticket/43.

        Show
        bryanthompson bryanthompson added a comment - It is interesting to note that the Inflater / Delater API compute the Adler 32 checksum as a side effect. See https://sourceforge.net/apps/trac/bigdata/ticket/43 .

          People

          • Assignee:
            Unassigned
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: