Details

    • Type: Task
    • Status: In Progress
    • Resolution: Unresolved
    • Affects Version/s: QUADS_QUERY_BRANCH
    • Fix Version/s: None
    • Component/s: CI, Project Management
    • Labels:
      None

      Description

      Modify the test setup to remove the use of File#deleteOnExit()
      . This was a kludge for tests which were not being correctly torn down and many of the unit tests which relies on this pattern. For example, in a CI run on Centos, 924 out of 1082 open files as reported by lsof -p <PID> are like this:

       /tmp/bigdata-DiskWORM-4381837471694413776.jnl (deleted)
      

      The scope of the change should not be that large:

      File.deleteOnExit() (24 hits)
      
      I also see:
      
      Options.DELETE_ON_CLOSE (2 hits - it is not in use anywhere)
      
      and
      
      Options.DELETE_ON_EXIT (43 hits)
      

      The DELETE_ON_CLOSE and DELETE_ON_EXIT options in com.bigdata.journal.Options should both be removed.

        Activity

        beebs Brad Bebee created issue -
        Hide
        bryanthompson bryanthompson added a comment -

        There is a subtle interaction in some unit tests where CREATE_TEMP_FILE is specified and the test then explicitly set the FILE property so the temporary file can be reopened. That is, the semantics of CREATE_TEMP_FILE DO NOT imply that the file should be deleted when it is closed (at least, not with the existing test suite).

        Show
        bryanthompson bryanthompson added a comment - There is a subtle interaction in some unit tests where CREATE_TEMP_FILE is specified and the test then explicitly set the FILE property so the temporary file can be reopened. That is, the semantics of CREATE_TEMP_FILE DO NOT imply that the file should be deleted when it is closed (at least, not with the existing test suite).
        Hide
        bryanthompson bryanthompson added a comment -

        These cases can be identified by searching for patterns such as:

                // Turn this off now since we want to re-open the same store.
                properties.setProperty(com.bigdata.journal.Options.CREATE_TEMP_FILE, "false");
        
        Show
        bryanthompson bryanthompson added a comment - These cases can be identified by searching for patterns such as: // Turn this off now since we want to re-open the same store. properties.setProperty(com.bigdata.journal.Options.CREATE_TEMP_FILE, "false");
        Hide
        bryanthompson bryanthompson added a comment -

        Simply removing Options#DELETE_ON_EXIT and Options#DELETE_ON_CLOSE also cause problems with the IndexSegment build stress tests. Those tests were written a long time ago and they leverage code which returns a BTree object with an implicit backing Journal. The tests need to be refactored to correctly manage the Journal and its tear down rather than relying on this implement "deleteOnClose" or "deleteOnExit" mechanism.

        Show
        bryanthompson bryanthompson added a comment - Simply removing Options#DELETE_ON_EXIT and Options#DELETE_ON_CLOSE also cause problems with the IndexSegment build stress tests. Those tests were written a long time ago and they leverage code which returns a BTree object with an implicit backing Journal. The tests need to be refactored to correctly manage the Journal and its tear down rather than relying on this implement "deleteOnClose" or "deleteOnExit" mechanism.
        Hide
        bryanthompson bryanthompson added a comment -

        This is a bit more complicated than I had hoped. I've backed out my changes rather than committing them. We will have to approach this on a test case by test case basis, examining how the tests rely on the tear down of their backing stores. The index segment build stress tests and some of the temporary embedded federation tests are the main culprits.

        I've dropped the priority for now as I do not expect to resolve this for our next release.

        Show
        bryanthompson bryanthompson added a comment - This is a bit more complicated than I had hoped. I've backed out my changes rather than committing them. We will have to approach this on a test case by test case basis, examining how the tests rely on the tear down of their backing stores. The index segment build stress tests and some of the temporary embedded federation tests are the main culprits. I've dropped the priority for now as I do not expect to resolve this for our next release.
        Hide
        bryanthompson bryanthompson added a comment -

        Here is a stack trace which shows that leaking file handles for deleteOnExit causes problems. This trace is from the TERMS_REFACTOR_BRANCH. Without increasing the limit on the #of open file, we run into a problem where we can no longer run tests.

        This could also be handled by shelling out a distinct JVM for each block of unit tests.

            [junit] ERROR: 954866      com.bigdata.service.TestReceiveFile$2.requestService5 com.bigdata.service.ResourceService$RequestTask.sendFile(ResourceService.java:900): Sending bigdata/src/test/com/bigdata/service/testSendFile.seg, length=11682035, uuid=8d7f505e-33ae-4148-8da3-f76b97d8ac57
            [junit] java.io.FileNotFoundException: bigdata/src/test/com/bigdata/service/testSendFile.seg (Too many open files)
            [junit]     at java.io.FileInputStream.open(Native Method)
            [junit]     at java.io.FileInputStream.<init>(FileInputStream.java:106)
            [junit]     at com.bigdata.service.ResourceService$RequestTask.sendFile(ResourceService.java:896)
            [junit]     at com.bigdata.service.ResourceService$RequestTask.run(ResourceService.java:739)
            [junit]     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
            [junit]     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
            [junit]     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
            [junit]     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
            [junit]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
            [junit]     at java.lang.Thread.run(Thread.java:619)
        
        Show
        bryanthompson bryanthompson added a comment - Here is a stack trace which shows that leaking file handles for deleteOnExit causes problems. This trace is from the TERMS_REFACTOR_BRANCH. Without increasing the limit on the #of open file, we run into a problem where we can no longer run tests. This could also be handled by shelling out a distinct JVM for each block of unit tests. [junit] ERROR: 954866 com.bigdata.service.TestReceiveFile$2.requestService5 com.bigdata.service.ResourceService$RequestTask.sendFile(ResourceService.java:900): Sending bigdata/src/test/com/bigdata/service/testSendFile.seg, length=11682035, uuid=8d7f505e-33ae-4148-8da3-f76b97d8ac57 [junit] java.io.FileNotFoundException: bigdata/src/test/com/bigdata/service/testSendFile.seg (Too many open files) [junit] at java.io.FileInputStream.open(Native Method) [junit] at java.io.FileInputStream.<init>(FileInputStream.java:106) [junit] at com.bigdata.service.ResourceService$RequestTask.sendFile(ResourceService.java:896) [junit] at com.bigdata.service.ResourceService$RequestTask.run(ResourceService.java:739) [junit] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [junit] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [junit] at java.util.concurrent.FutureTask.run(FutureTask.java:138) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [junit] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [junit] at java.lang.Thread.run(Thread.java:619)
        Hide
        bryanthompson bryanthompson added a comment -

        I've modified /etc/security/limits.config to raise the hard and soft limits on the #of open files for CI to 4096. I've also raised them in the current session

        ulimit -Hn 4096
        ulimit -Sn 4096
        

        I am now retrying a CI run on the Centos machine to see if this works around the problem.

        Show
        bryanthompson bryanthompson added a comment - I've modified /etc/security/limits.config to raise the hard and soft limits on the #of open files for CI to 4096. I've also raised them in the current session ulimit -Hn 4096 ulimit -Sn 4096 I am now retrying a CI run on the Centos machine to see if this works around the problem.
        Hide
        bryanthompson bryanthompson added a comment -

        Well, that did not do the trick....

        Show
        bryanthompson bryanthompson added a comment - Well, that did not do the trick....
        beebs Brad Bebee made changes -
        Field Original Value New Value
        Workflow BLZG: Simple Issue Tracking Workflow [ 10006 ] Trac Import v2 [ 11860 ]
        bryanthompson bryanthompson made changes -
        Component/s CI [ 10153 ]
        Component/s Project Management [ 10152 ]
        Component/s Other [ 10012 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v2 [ 11860 ] Trac Import v3 [ 13433 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v3 [ 13433 ] Trac Import v4 [ 14762 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v4 [ 14762 ] Trac Import v5 [ 16151 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v5 [ 16151 ] Trac Import v6 [ 18475 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v6 [ 18475 ] Trac Import v7 [ 19894 ]
        beebs Brad Bebee made changes -
        Workflow Trac Import v7 [ 19894 ] Trac Import v8 [ 21587 ]

          People

          • Assignee:
            bryanthompson bryanthompson
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: