Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-319

distributed merge sort operator (scale-out)

    Details

    • Type: New Feature
    • Status: Closed - Won't Fix
    • Resolution: Incomplete
    • Affects Version/s: QUADS_QUERY_BRANCH
    • Fix Version/s: None
    • Component/s: Query Engine
    • Labels:
      None

      Description

      Unselective queries using ORDER BY and DISTINCT queries with very high cardinality require a distributed sort. When the data volume is small, the sort can be conducted in memory. When the data value is large (and this is the only time we would handle DISTINCT using a SORT operator) an external merge sort is required.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        A nice way to approach sorting variable length data is to use an order preserving hash or other mechanism to produce a word length (4 bytes or 8 bytes) key which reflects the order of the variable length records. The word length keys are then sorted by an efficient fixed-length sort algorithm. A second pass breaks ties by considering the full records, but at that point the data are already nearly fully ordered.

        Show
        bryanthompson bryanthompson added a comment - A nice way to approach sorting variable length data is to use an order preserving hash or other mechanism to produce a word length (4 bytes or 8 bytes) key which reflects the order of the variable length records. The word length keys are then sorted by an efficient fixed-length sort algorithm. A second pass breaks ties by considering the full records, but at that point the data are already nearly fully ordered.
        Hide
        bryanthompson bryanthompson added a comment -

        A first approximation for a cluster would be a distributed in-memory sort. Hash partition the solutions across the cluster, sort them on each node, and then using a merge iterator pattern to bring back the solutions in a total order.

        Show
        bryanthompson bryanthompson added a comment - A first approximation for a cluster would be a distributed in-memory sort. Hash partition the solutions across the cluster, sort them on each node, and then using a merge iterator pattern to bring back the solutions in a total order.
        Hide
        bryanthompson bryanthompson added a comment -

        Closed. Not relevant for the new scale-out architecture.

        Show
        bryanthompson bryanthompson added a comment - Closed. Not relevant for the new scale-out architecture.

          People

          • Assignee:
            Unassigned
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: