Uploaded image for project: 'Blazegraph (by SYSTAP)'
  1. Blazegraph (by SYSTAP)
  2. BLZG-335

extractResources.sh, archiveResources.sh are not functioning as deployed

    Details

      Description

      These scripts extract and archive performance counters, configuration files and other information which is critical when analyzing and debugging a cluster.

      These scripts currently have a dependency on
      {
      /src/resources/analysis/queries
      }

      and also on
      {
      build.xml
      }

      They use several files in the queries directory which specify which performance counters to extract. build.xml is used to run the extract process, which is a java program.

        Activity

        Hide
        bryanthompson bryanthompson added a comment -

        These scripts should also collect a variety of system information. I can offhand recommend at least "ulimit -a" (for example, this is sufficient to identify problems caused by the file limit). It would be great if people could suggest some other things which should be reported here as well (for example, listing the open files, etc.)

        Show
        bryanthompson bryanthompson added a comment - These scripts should also collect a variety of system information. I can offhand recommend at least "ulimit -a" (for example, this is sufficient to identify problems caused by the file limit). It would be great if people could suggest some other things which should be reported here as well (for example, listing the open files, etc.)
        Hide
        bryanthompson bryanthompson added a comment -

        Another good piece of information is
        {

        sysctl vm.swappiness

        }

        If this value is its default for Linux (60) then that will severely limit the memory the OS will make available before swapping.

        Also, if the cluster is not using a network time protocol then it is quite common that the clocks of the various nodes will be different enough to make it difficult to interpret the performance counters. My experience is that a drift of minutes among the machines is common, different timezones is common, and a difference hours, days or years is not at all uncommon.

        Another common source of problems is DNS for the cluster. Is there any way to readily detect a bad configuration among the nodes of a cluster?

        Show
        bryanthompson bryanthompson added a comment - Another good piece of information is { sysctl vm.swappiness } If this value is its default for Linux (60) then that will severely limit the memory the OS will make available before swapping. Also, if the cluster is not using a network time protocol then it is quite common that the clocks of the various nodes will be different enough to make it difficult to interpret the performance counters. My experience is that a drift of minutes among the machines is common, different timezones is common, and a difference hours, days or years is not at all uncommon. Another common source of problems is DNS for the cluster. Is there any way to readily detect a bad configuration among the nodes of a cluster?
        Hide
        bryanthompson bryanthompson added a comment -

        Closed. Not relevant to the new architecture.

        Show
        bryanthompson bryanthompson added a comment - Closed. Not relevant to the new architecture.

          People

          • Assignee:
            Unassigned
            Reporter:
            bryanthompson bryanthompson
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: