Skip to content

Commit

Permalink
Added a port of the sge script to check the status of slurm queues
Browse files Browse the repository at this point in the history
Also added a script to check apstat and report nodes up, down, avail, and in use on a cray XT or XE system.
  • Loading branch information
khoward authored and bernardl committed Jan 5, 2011
1 parent caca268 commit 17cd0ad
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 0 deletions.
17 changes: 17 additions & 0 deletions hpc/cray_nodestat/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Gmetric script pulled from http://ganglia.info/gmetric

If you are the author of the script, and have an updated version, please fork the repo and
submit a pull request. For additional information about the repository, please see the
README in the repository top-level.

Author: Kris Howard and Fabio Verzelloni

Description:

Reports number of nodes up, down, available, and in use as reported by apstat.

Language: Shell

Category: Statistics :: Cluster

Dependencies: awk, apstat
27 changes: 27 additions & 0 deletions hpc/cray_nodestat/cray_nodestat.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

# Script to collect stats about a Cray XT or XE system.
# Reports up, down, avail, and in use

/usr/bin/apstat -n | tail -1 | awk '{system("/usr/bin/gmetric -nnode_total -v" $2 " -tuint16 -u"$2"")} \
{system("/usr/bin/gmetric -nnode_avail -v" $6 " -tuint16 -u"$6"")} \
{system("/usr/bin/gmetric -nnode_up -v" $3 " -tuint16 -u"$3"")} \
{system("/usr/bin/gmetric -nnode_down -v" $7 " -tuint16 -u"$7"")} \
{system("/usr/bin/gmetric -nnode_use -v" $4 " -tuint16 -u"$4"")}'

#########################################################################
# Previous Iteration
########################################################################

#NODE_TOTAL=$(/usr/bin/apstat -n | tail -1 | awk '{print $2}')
#NODE_AVAIL=$(/usr/bin/apstat -n | tail -1 | awk '{print $6}')
#NODE_UP=$(/usr/bin/apstat -n | tail -1 | awk '{print $3}')
#NODE_DOWN=$(/usr/bin/apstat -n | tail -1 | awk '{print $7}')
#NODE_USE=$(/usr/bin/apstat -n | tail -1 | awk '{print $4}')

#$APPS/system/ganglia-3.1.7/bin/gmetric -n node_total -v $NODE_TOTAL -t string -u nodes
#$APPS/system/ganglia-3.1.7/bin/gmetric -n node_avail -v $NODE_AVAIL -t float -u nodes
#$APPS/system/ganglia-3.1.7/bin/gmetric -n node_up -v $NODE_UP -t float -u nodes
#$APPS/system/ganglia-3.1.7/bin/gmetric -n node_down -v $NODE_DOWN -t float -u nodes
#$APPS/system/ganglia-3.1.7/bin/gmetric -n node_use -v $NODE_USE -t float -u nodes

18 changes: 18 additions & 0 deletions hpc/slurm_jobs/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Gmetric script pulled from http://ganglia.info/gmetric

If you are the author of the script, and have an updated version, please fork the repo and
submit a pull request. For additional information about the repository, please see the
README in the repository top-level.

Author: Jesse Becker
Edited By: Kris Howard

Description:

Reports number of jobs running, queued, and in error states.

Language: Shell

Category: Statistics :: Cluster

Dependencies: awk
20 changes: 20 additions & 0 deletions hpc/slurm_jobs/slurm_jobs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

squeue | awk '
BEGIN { pending=running=error=0; }
($5 ~ /^PD/) { pending++; }
($5 ~ /[rRt]/) { running++; }
($5 ~ /E/ ) { error++; }
END {
cmd="/usr/bin/gmetric --name slurmq_pending --value "pending" --type uint16";
system(cmd);
cmd="/usr/bin/gmetric --name slurmq_running --value "running" --type uint16";
system(cmd);
cmd="/usr/bin/gmetric --name slurmq_error --value "error" --type uint16";
system(cmd);
#print "Pending="pending" Running="running" Errors="error;
}'


exit

0 comments on commit 17cd0ad

Please sign in to comment.