Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-446

Scaleable ORDER BY operator using native heap

    Details

      Description

      Implement a scalable ORDER BY operator using the native process heap to buffer the solutions.

      The easiest first approximation for this would buffer the solution sets on the memory store using a "Bucket" object to maintain the list of addresses for solutions having the same hash code, compute an order preserving hash code from the keys for the ORDER BY clause, and produce two correlated arrays:


      - hash[] // the order preserving hash code
      - Bucket[] // the collision bucket for that hash code

      The arrays can then be sorted based on the hash code.

      The solutions are delivered one "bucket" worth at a time. The data in the buckets are unordered, but the buckets themselves are ordered. Therefore, each bucket needs to undergo a secondary sort when it is delivered to the downstream operator in the pipeline.

      A secondary refinement would replace the use of the Bucket object with the HTree to store the data.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        I've asked Martyn to take a first look at this. This could be approached in increments, beginning with a more scalable sort using an indirection vector against solutions buffered in the memory manager. We will also need some good unit and performance tests. The latter should look at data volumes sufficient to challenge the Java heap.

        Show
        bryanthompson bryanthompson added a comment - I've asked Martyn to take a first look at this. This could be approached in increments, beginning with a more scalable sort using an indirection vector against solutions buffered in the memory manager. We will also need some good unit and performance tests. The latter should look at data volumes sufficient to challenge the Java heap.
        Hide
        bryanthompson bryanthompson added a comment -

        We have decided to do an initial prototype based on the existing canonical huffman codec, and in particular the CanonicalFast64CodeWordDecoder which handles codes with maximum code word bit length of 64 bits. This is appropriate since we will perform the initial sort based on 32-bit or 64-bit codes (sequences of code words) and hence the 64-bit code word length limit is perfectly acceptable. The CanonicalHuffmanRabaCoder already has tested logic for encode/decode. While this is suitable for smaller solution sets, the main purpose of a WORD oriented sort is to have efficiency on larger solution sets. The logic in the CanonicalHuffmanRabaCoder would have to be refactored to support an approach organized around solution sets stored on data blocks and suitable for external memory, but a prototype could be implemented as is. Regardless, once the int32 or int64 keys have been generated, the mapping between the code keys and the corresponding solution records is simply formatted into an int[] or a long[] with stride 2 and sorted based on the even index values (the codes). After the sort, array entries which were assigned the same code must then by sorted by their full key (the ORDER BY value expression list) as they are materialized as the output of the ORDER BY operator.

        Show
        bryanthompson bryanthompson added a comment - We have decided to do an initial prototype based on the existing canonical huffman codec, and in particular the CanonicalFast64CodeWordDecoder which handles codes with maximum code word bit length of 64 bits. This is appropriate since we will perform the initial sort based on 32-bit or 64-bit codes (sequences of code words) and hence the 64-bit code word length limit is perfectly acceptable. The CanonicalHuffmanRabaCoder already has tested logic for encode/decode. While this is suitable for smaller solution sets, the main purpose of a WORD oriented sort is to have efficiency on larger solution sets. The logic in the CanonicalHuffmanRabaCoder would have to be refactored to support an approach organized around solution sets stored on data blocks and suitable for external memory, but a prototype could be implemented as is. Regardless, once the int32 or int64 keys have been generated, the mapping between the code keys and the corresponding solution records is simply formatted into an int[] or a long[] with stride 2 and sorted based on the even index values (the codes). After the sort, array entries which were assigned the same code must then by sorted by their full key (the ORDER BY value expression list) as they are materialized as the output of the ORDER BY operator.
        Hide
        bryanthompson bryanthompson added a comment -

        ORDER BY semantics for SPARQL extend the "<" operator as described in [1]. Note that the specification does NOT define a full ordering. Also note that the ORDER BY semantics do provide for comparison among things which would cause type errors in the "<" operator, but only by partitioning the ordering into regions which are not mutually comparable. The "<" operator is currently modeled by CompareBOp. The Sesame ValueComparator provides the extensions for SPARQL ORDER BY semantics.

        We SHOULD provide optimizations for CompareBOp which work with one materialized value and one inline IV.

        In order to support ORDER BY without requiring that all IVs are materialized, we need to provide an IV aware version of the ValueComparator. There is a bug in MemoryOrderByOp which relates to this. Because it is invoking ValueComparator directly, it will only succeed if we always materialize the IVs.

        [1] http://www.w3.org/TR/sparql11-query/#modOrderBy

        Show
        bryanthompson bryanthompson added a comment - ORDER BY semantics for SPARQL extend the "<" operator as described in [1] . Note that the specification does NOT define a full ordering. Also note that the ORDER BY semantics do provide for comparison among things which would cause type errors in the "<" operator, but only by partitioning the ordering into regions which are not mutually comparable. The "<" operator is currently modeled by CompareBOp. The Sesame ValueComparator provides the extensions for SPARQL ORDER BY semantics. We SHOULD provide optimizations for CompareBOp which work with one materialized value and one inline IV. In order to support ORDER BY without requiring that all IVs are materialized, we need to provide an IV aware version of the ValueComparator. There is a bug in MemoryOrderByOp which relates to this. Because it is invoking ValueComparator directly, it will only succeed if we always materialize the IVs. [1] http://www.w3.org/TR/sparql11-query/#modOrderBy
        Hide
        bryanthompson bryanthompson added a comment -

        I've written a test suite for the MemorySortOp and generalize the code to handle the incremental materialization of solutions and computed value expressions. AST2BOpUtility has been updated to reflect the changed annotations for the MemorySortOp, which now defines both a SORT_ORDER and a VALUE_COMPARATOR annotation.

        There are some known issues with comparisons of inline and non-inline IVs which MikeP will be addressing shortly. Basically, we need an INeedsMaterialization option for operators which always require the materialization of non-inline IVs. ORDER_BY and aggregation both have this characteristic since they must run over all solutions at once and can not re-try solutions which have failed with a NotMaterializedException. As an initial step in that direction, I have defined an IVComparator and a test suite for that class and also added a compareLiteral(IV,Literal) to IVUtility. While this covers some cases, it does not cover all as demonstrated by TestIVComparator.

        There is also an apparent bug in the QueryEngine dealing with its ability to guarantee single threaded evaluation for operators such as sort, or perhaps it is just a bug in the test suite. Regardless, I plan to look into this today and also look further into atomic decision points for the release of native memory back to the direct buffer pools in the QueryEngine.
        Committed revision r5025.

        Show
        bryanthompson bryanthompson added a comment - I've written a test suite for the MemorySortOp and generalize the code to handle the incremental materialization of solutions and computed value expressions. AST2BOpUtility has been updated to reflect the changed annotations for the MemorySortOp, which now defines both a SORT_ORDER and a VALUE_COMPARATOR annotation. There are some known issues with comparisons of inline and non-inline IVs which MikeP will be addressing shortly. Basically, we need an INeedsMaterialization option for operators which always require the materialization of non-inline IVs. ORDER_BY and aggregation both have this characteristic since they must run over all solutions at once and can not re-try solutions which have failed with a NotMaterializedException. As an initial step in that direction, I have defined an IVComparator and a test suite for that class and also added a compareLiteral(IV,Literal) to IVUtility. While this covers some cases, it does not cover all as demonstrated by TestIVComparator. There is also an apparent bug in the QueryEngine dealing with its ability to guarantee single threaded evaluation for operators such as sort, or perhaps it is just a bug in the test suite. Regardless, I plan to look into this today and also look further into atomic decision points for the release of native memory back to the direct buffer pools in the QueryEngine. Committed revision r5025.
        Hide
        bryanthompson bryanthompson added a comment -

        The bug mentioned above is resolved in committed revision r5027. See [1] for details.

        [1] https://sourceforge.net/apps/trac/bigdata/ticket/363#comment:3

        Show
        bryanthompson bryanthompson added a comment - The bug mentioned above is resolved in committed revision r5027. See [1] for details. [1] https://sourceforge.net/apps/trac/bigdata/ticket/363#comment:3

          People

          • Assignee:
            martyncutcher martyncutcher
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: