Thursday 14 June 2018

Overlaying SLURM job timings on Grafana plots

As you may have noticed, I'm quite fond of Grafana and use it at home and work. One of the dashboards I have at work is the general state of our lustre filesystems, showing IO and metadata traffic, collected by a custom python script (I'm working on converting this to a real collectd python plugin) which stores the data in an influxDB.

I've since written a small python script that talks to our SLURM accounting DB, so that given a jobID, we can get the start/end times and overlay those using the annotations API. One minor niggle in that the API expects epoch milliseconds, and seems to be tied to the TZ of the browser that generated the API key.

however...
~$ annotate_job 2924399
Found the following job:
  User: bskjerven (pawsey0001)
  Cluster: magnus, Partition: workq, QOS: normal
  Nodes: 768, CPUs: 36864
  Start: 2018-06-11 17:23:22, End: 2018-06-11 19:54:44
Got something back - Annotate? (y/n) y
200 - Annotation added

and lo - 


Feeling Pumped!

Having just had a day without power, and then going round the site to check everything came back online correctly (including services such a...