Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-34

Bug in JiniCoreServicesConfiguration.getServiceRegistrars() (infinite wait)

    Details

      Description

      A bug was reported via the developers mailing list where the JiniCoreServicesConfiguration.getServiceRegistrars() method could go into a non-timed Object.wait(long timeout). The documentation for Object.wait(long timeout) indicates that a timeout of ZERO is treated as NO timeout rather than an immediate timeout. Since remaining is in nanoseconds, the timeout after conversion to milliseconds, can be ZERO (0).

      Problem code @ line 434.

                  while (registrars.length < maxCount) {
      
                      remaining = nanos - (System.nanoTime() - begin);
      
                      if (remaining <= 0) {
      

      Proposed fix:

                  while (registrars.length < maxCount) {
      
                      remaining = nanos - (System.nanoTime() - begin);
      
                      if (TimeUnit.NANOSECONDS.toMillis(remaining) <= 0) {
      

      The proposed fix addresses the problem by ensuring that the remaining time is at least one millisecond (after conversion from nanoseconds to milliseconds). Hence Object.wait(timeout) never goes into an untimed wait.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        Merged from master to ticket_168 to get changes into CI.

        Show
        bryanthompson bryanthompson added a comment - Merged from master to ticket_168 to get changes into CI.
        Hide
        bryanthompson bryanthompson added a comment -

        I have fixed three problems:

        • JiniCoreServicesConfiguration: timeout could become ZERO, which means an infinite timeout for Object.wait().
        • AbstractZNodeConditionWatcher: remaining was not computed correctly.
        • AbstractZooQueue: remaining was not computed correctly.

        Note: This might fix the stochastic CI failures for com.bigdata.zookeeper.TestHierarchicalZNodeWatcher.test_noticeCreate.

        Commit 97a92e654065cc793ceafc904d260463d557964c to github branch BLZG-34.

        See https://github.com/SYSTAP/bigdata/pull/67

        Merged to master.

        Show
        bryanthompson bryanthompson added a comment - I have fixed three problems: JiniCoreServicesConfiguration: timeout could become ZERO, which means an infinite timeout for Object.wait(). AbstractZNodeConditionWatcher: remaining was not computed correctly. AbstractZooQueue: remaining was not computed correctly. Note: This might fix the stochastic CI failures for com.bigdata.zookeeper.TestHierarchicalZNodeWatcher.test_noticeCreate. Commit 97a92e654065cc793ceafc904d260463d557964c to github branch BLZG-34 . See https://github.com/SYSTAP/bigdata/pull/67 Merged to master.
        Hide
        bryanthompson bryanthompson added a comment -

        I briefly examined the callers for Object.wait(long timeout). This survey suggests that we have several related bugs, including ones where remaining is calculated by successively subtracting away elapsed rather than being directly computed from total - (now - begin).

        AbstractZNodeConditionWatcher.awaitCondition(final boolean testConditionOnEntry, final long timeout, final TimeUnit unit)

                    while (millis > 0 && !conditionsatisfied && !isCancelled()) {
        
                        this.wait(millis);
        
                        millis -= (System.currentTimeMillis() - begin);
        
                        if (log.isInfoEnabled())
                            log.info("woke up: conditionSatisifed="
                                    + conditionsatisfied + ", remaining=" + millis
                                    + "ms");
                        
                    }
        

        AbstractZooQueue.awaitCondition(final long timeout, final TimeUnit unit)

                                while (!isDone(n)) {
        
                                    this.wait(remaining);
        
                                    remaining -= System.currentTimeMillis() - begin;
        
                                    n = zookeeper.getChildren(zpath, this).size();
        
                                    if(INFO)
                                        log.info("Queue size: " + n + ", remaining="
                                                + remaining + "ms");
                                    
                                }
        

        All such callers should be reviewed for correctness as part of this ticket.

        Show
        bryanthompson bryanthompson added a comment - I briefly examined the callers for Object.wait(long timeout). This survey suggests that we have several related bugs, including ones where remaining is calculated by successively subtracting away elapsed rather than being directly computed from total - (now - begin). AbstractZNodeConditionWatcher.awaitCondition(final boolean testConditionOnEntry, final long timeout, final TimeUnit unit) while (millis > 0 && !conditionsatisfied && !isCancelled()) { this .wait(millis); millis -= ( System .currentTimeMillis() - begin); if (log.isInfoEnabled()) log.info( "woke up: conditionSatisifed=" + conditionsatisfied + ", remaining=" + millis + "ms" ); } AbstractZooQueue.awaitCondition(final long timeout, final TimeUnit unit) while (!isDone(n)) { this .wait(remaining); remaining -= System .currentTimeMillis() - begin; n = zookeeper.getChildren(zpath, this ).size(); if (INFO) log.info( "Queue size: " + n + ", remaining=" + remaining + "ms" ); } All such callers should be reviewed for correctness as part of this ticket.

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 2 hours
              2h
              Remaining:
              0m
              Logged:
              Time Not Required
              Not Specified