Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-34

Bug in JiniCoreServicesConfiguration.getServiceRegistrars() (infinite wait)

    Details

      Description

      A bug was reported via the developers mailing list where the JiniCoreServicesConfiguration.getServiceRegistrars() method could go into a non-timed Object.wait(long timeout). The documentation for Object.wait(long timeout) indicates that a timeout of ZERO is treated as NO timeout rather than an immediate timeout. Since remaining is in nanoseconds, the timeout after conversion to milliseconds, can be ZERO (0).

      Problem code @ line 434.

                  while (registrars.length < maxCount) {
      
                      remaining = nanos - (System.nanoTime() - begin);
      
                      if (remaining <= 0) {
      

      Proposed fix:

                  while (registrars.length < maxCount) {
      
                      remaining = nanos - (System.nanoTime() - begin);
      
                      if (TimeUnit.NANOSECONDS.toMillis(remaining) <= 0) {
      

      The proposed fix addresses the problem by ensuring that the remaining time is at least one millisecond (after conversion from nanoseconds to milliseconds). Hence Object.wait(timeout) never goes into an untimed wait.

        Activity

        bryanthompson bryanthompson created issue -
        Hide
        bryanthompson bryanthompson added a comment -

        I briefly examined the callers for Object.wait(long timeout). This survey suggests that we have several related bugs, including ones where remaining is calculated by successively subtracting away elapsed rather than being directly computed from total - (now - begin).

        AbstractZNodeConditionWatcher.awaitCondition(final boolean testConditionOnEntry, final long timeout, final TimeUnit unit)

                    while (millis > 0 && !conditionsatisfied && !isCancelled()) {
        
                        this.wait(millis);
        
                        millis -= (System.currentTimeMillis() - begin);
        
                        if (log.isInfoEnabled())
                            log.info("woke up: conditionSatisifed="
                                    + conditionsatisfied + ", remaining=" + millis
                                    + "ms");
                        
                    }
        

        AbstractZooQueue.awaitCondition(final long timeout, final TimeUnit unit)

                                while (!isDone(n)) {
        
                                    this.wait(remaining);
        
                                    remaining -= System.currentTimeMillis() - begin;
        
                                    n = zookeeper.getChildren(zpath, this).size();
        
                                    if(INFO)
                                        log.info("Queue size: " + n + ", remaining="
                                                + remaining + "ms");
                                    
                                }
        

        All such callers should be reviewed for correctness as part of this ticket.

        Show
        bryanthompson bryanthompson added a comment - I briefly examined the callers for Object.wait(long timeout). This survey suggests that we have several related bugs, including ones where remaining is calculated by successively subtracting away elapsed rather than being directly computed from total - (now - begin). AbstractZNodeConditionWatcher.awaitCondition(final boolean testConditionOnEntry, final long timeout, final TimeUnit unit) while (millis > 0 && !conditionsatisfied && !isCancelled()) { this .wait(millis); millis -= ( System .currentTimeMillis() - begin); if (log.isInfoEnabled()) log.info( "woke up: conditionSatisifed=" + conditionsatisfied + ", remaining=" + millis + "ms" ); } AbstractZooQueue.awaitCondition(final long timeout, final TimeUnit unit) while (!isDone(n)) { this .wait(remaining); remaining -= System .currentTimeMillis() - begin; n = zookeeper.getChildren(zpath, this ).size(); if (INFO) log.info( "Queue size: " + n + ", remaining=" + remaining + "ms" ); } All such callers should be reviewed for correctness as part of this ticket.
        Hide
        bryanthompson bryanthompson added a comment -

        I have fixed three problems:

        • JiniCoreServicesConfiguration: timeout could become ZERO, which means an infinite timeout for Object.wait().
        • AbstractZNodeConditionWatcher: remaining was not computed correctly.
        • AbstractZooQueue: remaining was not computed correctly.

        Note: This might fix the stochastic CI failures for com.bigdata.zookeeper.TestHierarchicalZNodeWatcher.test_noticeCreate.

        Commit 97a92e654065cc793ceafc904d260463d557964c to github branch BLZG-34.

        See https://github.com/SYSTAP/bigdata/pull/67

        Merged to master.

        Show
        bryanthompson bryanthompson added a comment - I have fixed three problems: JiniCoreServicesConfiguration: timeout could become ZERO, which means an infinite timeout for Object.wait(). AbstractZNodeConditionWatcher: remaining was not computed correctly. AbstractZooQueue: remaining was not computed correctly. Note: This might fix the stochastic CI failures for com.bigdata.zookeeper.TestHierarchicalZNodeWatcher.test_noticeCreate. Commit 97a92e654065cc793ceafc904d260463d557964c to github branch BLZG-34 . See https://github.com/SYSTAP/bigdata/pull/67 Merged to master.
        bryanthompson bryanthompson made changes -
        Field Original Value New Value
        Status To Do [ 10001 ] Done [ 10000 ]
        Resolution Fixed [ 1 ]
        bryanthompson bryanthompson made changes -
        Original Estimate 2 minutes [ 120 ]
        Remaining Estimate 2 minutes [ 120 ]
        Component/s Bigdata Federation [ 10001 ]
        bryanthompson bryanthompson made changes -
        Original Estimate 2 minutes [ 120 ] 2 hours [ 7200 ]
        Remaining Estimate 2 minutes [ 120 ] 0 minutes [ 0 ]
        Hide
        bryanthompson bryanthompson added a comment -

        Merged from master to ticket_168 to get changes into CI.

        Show
        bryanthompson bryanthompson added a comment - Merged from master to ticket_168 to get changes into CI.
        beebs Brad Bebee made changes -
        Workflow BLZG: Simple Issue Tracking Workflow [ 10033 ] Trac Import v2 [ 11839 ]
        bryanthompson bryanthompson made changes -
        Fix Version/s BLAZEGRAPH_RELEASE_1_5_2 [ 10164 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v2 [ 11839 ] Trac Import v3 [ 13334 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v3 [ 13334 ] Trac Import v4 [ 14663 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v4 [ 14663 ] Trac Import v5 [ 16032 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v5 [ 16032 ] Trac Import v6 [ 17500 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v6 [ 17500 ] Trac Import v7 [ 18895 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v7 [ 18895 ] Trac Import v8 [ 20511 ]

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2 hours
              2h
              Remaining:
              0m
              Logged:
              Time Not Required
              Not Specified