Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-405

BOpUtility#preOrderTraversal() takes time exponential in the depth of the operator tree.

    Details

    • Type: Bug
    • Status: Done
    • Resolution: Done
    • Affects Version/s: QUADS_QUERY_BRANCH
    • Fix Version/s: None
    • Component/s: B+Tree

      Description

      BOpUtility#preOrderTraversal() is used for several purposes during SPARQL query evaluation. This method can take time which appears to be exponential in the depth of the operator tree, as demonstrated by:

      com.bigdata.rdf.bop.TestBOpUtility
      

      Very nearly the same logic is used by the B+Tree iterators, but the B+Tree never has depths as high as those associated with a FILTER with a lot of components. Once the fix is identified, we should also look at the B+Tree to see if the fix can be applied there as well.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Martyn's made a change to the Appenderator which gives good throughput to a greater depth (greater than the query which triggered this issue), but the nested iterator pattern is continuing to display an exponential cost. We are looking at caching the iterator with one-step lookahead to further reduce the runtime cost.

        Show
        bryanthompson bryanthompson added a comment - Martyn's made a change to the Appenderator which gives good throughput to a greater depth (greater than the query which triggered this issue), but the nested iterator pattern is continuing to display an exponential cost. We are looking at caching the iterator with one-step lookahead to further reduce the runtime cost.
        Hide
        martyncutcher martyncutcher added a comment -

        The Appenderator essentially queues two Iterators behind a single Iterator interface. It did not check on hasNext whether it was processing the first or second Iterator, resulting in an unnecessary repeat call to the second Iterator's hasNext when it returns false.

        However, this only effected the degree of the exponential deterioration for the preOrderTraversal, the Expanderator also exhibits exponential performance degradation which can be addressed using a prefetch strategy to reduce repeat calls into the Striterator stack.

        I am raising a further ticket to cover a tail optimisation protocol to support dynamic collapsing of the Striterator stack.

        Show
        martyncutcher martyncutcher added a comment - The Appenderator essentially queues two Iterators behind a single Iterator interface. It did not check on hasNext whether it was processing the first or second Iterator, resulting in an unnecessary repeat call to the second Iterator's hasNext when it returns false. However, this only effected the degree of the exponential deterioration for the preOrderTraversal, the Expanderator also exhibits exponential performance degradation which can be addressed using a prefetch strategy to reduce repeat calls into the Striterator stack. I am raising a further ticket to cover a tail optimisation protocol to support dynamic collapsing of the Striterator stack.
        Hide
        bryanthompson bryanthompson added a comment -

        I have reduced the priority on this issue since the changes which have already been made allow queries to run efficiently. The issue is still open pending further optimizations and/or possible side-effects of these changes.

        Show
        bryanthompson bryanthompson added a comment - I have reduced the priority on this issue since the changes which have already been made allow queries to run efficiently. The issue is still open pending further optimizations and/or possible side-effects of these changes.
        Hide
        bryanthompson bryanthompson added a comment -

        Martyn notes that the LeafTupleIterator side effects the object, such that

        t1 = iter.next();

        followed by

        t2 = iter.next()

        side effects t1.

        This was causing the BTree+ tests to fail wherever they used rangeIterators to assert expected object arrays.

        If in TestIterator test_leaf_entryIterator01 I add :

                 Iterator riter = root.rangeIterator(null,null,flags);
                 Object o1 = riter.next();
                 System.out.println("Read O1: " + o1);
                 Object o2 = riter.next();
                 System.out.println("Read O2: " + o2);
                 System.out.println("Now O1: " + o1);
        

        It reports the following:

        {{

        { Read O1: com.bigdata.btree.Tuple@2001391565\{ nvisited=1, flags=[KEYS,VALS], key=[0, 0, 3|-128,], val=[\-19, 0, 5, 115, 114, 0, 29, 99, 111, 109, 46, 98, 105, 103, 100, 97, 116, 97, 46, 98, 116, 114, 101, 101, 46, 83, 105, 109, 112, 108, 101, 69, 110, 116, 114, 121, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 73, 0, 2, 105, 100, 120, 112, 0, 0, 0, 3|-84,], obj=3, sourceIndex=0}

        Read O2: com.bigdata.btree.Tuple@2001391565{ nvisited=2, flags=[KEYS,VALS], key=[0, 0, 5|-128,], val=[\-19, 0, 5, 115, 114, 0, 29, 99, 111, 109, 46, 98, 105, 103, 100, 97, 116, 97, 46, 98, 116, 114, 101, 101, 46, 83, 105, 109, 112, 108, 101, 69, 110, 116, 114, 121, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 73, 0, 2, 105, 100, 120, 112, 0, 0, 0, 5|-84,], obj=5, sourceIndex=0} Now O1: com.bigdata.btree.Tuple@2001391565{ nvisited=2, flags=[KEYS,VALS], key=[0, 0, 5|-128,], val=[\-19, 0, 5, 115, 114, 0, 29, 99, 111, 109, 46, 98, 105, 103, 100, 97, 116, 97, 46, 98, 116, 114, 101, 101, 46, 83, 105, 109, 112, 108, 101, 69, 110, 116, 114, 121, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 73, 0, 2, 105, 100, 120, 112, 0, 0, 0, 5|-84,], obj=5, sourceIndex=0}
        }}

        So the LeafTupleIterator appears to be returning the same object and side-effecting it. Is this intended?

        Show
        bryanthompson bryanthompson added a comment - Martyn notes that the LeafTupleIterator side effects the object, such that t1 = iter.next(); followed by t2 = iter.next() side effects t1. This was causing the BTree+ tests to fail wherever they used rangeIterators to assert expected object arrays. If in TestIterator test_leaf_entryIterator01 I add : Iterator riter = root.rangeIterator(null,null,flags); Object o1 = riter.next(); System.out.println("Read O1: " + o1); Object o2 = riter.next(); System.out.println("Read O2: " + o2); System.out.println("Now O1: " + o1); It reports the following: {{ { Read O1: com.bigdata.btree.Tuple@2001391565\{ nvisited=1, flags=[KEYS,VALS], key=[0, 0, 3|-128,], val=[\-19, 0, 5, 115, 114, 0, 29, 99, 111, 109, 46, 98, 105, 103, 100, 97, 116, 97, 46, 98, 116, 114, 101, 101, 46, 83, 105, 109, 112, 108, 101, 69, 110, 116, 114, 121, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 73, 0, 2, 105, 100, 120, 112, 0, 0, 0, 3|-84,], obj=3, sourceIndex=0} Read O2: com.bigdata.btree.Tuple@2001391565{ nvisited=2, flags= [KEYS,VALS] , key= [0, 0, 5|-128,] , val= [\-19, 0, 5, 115, 114, 0, 29, 99, 111, 109, 46, 98, 105, 103, 100, 97, 116, 97, 46, 98, 116, 114, 101, 101, 46, 83, 105, 109, 112, 108, 101, 69, 110, 116, 114, 121, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 73, 0, 2, 105, 100, 120, 112, 0, 0, 0, 5|-84,] , obj=5, sourceIndex=0} Now O1: com.bigdata.btree.Tuple@2001391565{ nvisited=2, flags= [KEYS,VALS] , key= [0, 0, 5|-128,] , val= [\-19, 0, 5, 115, 114, 0, 29, 99, 111, 109, 46, 98, 105, 103, 100, 97, 116, 97, 46, 98, 116, 114, 101, 101, 46, 83, 105, 109, 112, 108, 101, 69, 110, 116, 114, 121, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 73, 0, 2, 105, 100, 120, 112, 0, 0, 0, 5|-84,] , obj=5, sourceIndex=0} }} So the LeafTupleIterator appears to be returning the same object and side-effecting it. Is this intended?
        Hide
        bryanthompson bryanthompson added a comment -

        Basically, yes, the Tuple instance is reused for each Tuple the B+Tree visits. This was done in an attempt to reduce the heap churn caused by copying the B+Tree key and value for each tuple into the Tuple object. That churn can be pretty large when the B+Tree keys and values are large.

        However, this reuse of the Tuple instance and its internal buffers does interact with lookahead in the iterator pattern per Martyn's note above.

        Show
        bryanthompson bryanthompson added a comment - Basically, yes, the Tuple instance is reused for each Tuple the B+Tree visits. This was done in an attempt to reduce the heap churn caused by copying the B+Tree key and value for each tuple into the Tuple object. That churn can be pretty large when the B+Tree keys and values are large. However, this reuse of the Tuple instance and its internal buffers does interact with lookahead in the iterator pattern per Martyn's note above.
        Hide
        martyncutcher martyncutcher added a comment -

        Okay, I am creating a prefetch pattern that will protect against side-effecting return values
        - although patterns such as Sorting will always be impossible with side-effecting returns.

        The prefetch pattern will be integrated with the new tail optimisation pattern.

        Show
        martyncutcher martyncutcher added a comment - Okay, I am creating a prefetch pattern that will protect against side-effecting return values - although patterns such as Sorting will always be impossible with side-effecting returns. The prefetch pattern will be integrated with the new tail optimisation pattern.
        Hide
        martyncutcher martyncutcher added a comment -

        The prefetch and tail optimisation patterns reduce the cost of such deep striterations by a further order of magnitude (at least).

        Show
        martyncutcher martyncutcher added a comment - The prefetch and tail optimisation patterns reduce the cost of such deep striterations by a further order of magnitude (at least).

          People

          • Assignee:
            martyncutcher martyncutcher
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: