618. Modifying ECCE to work with slurm

UPDATE: ecce stops monitoring the job after 10-20 seconds. The job continues fine though. Working on fixing the monitoring issue. This message will be removed once that's fixed. It was due to $q needing to be lowercase (i.e. 'slurm', not 'Slurm') in eccejobmonitor.



Sun Gridengine has been removed from debian jessie (it's in wheezy and sid). This has given me a good excuse to explore setting up SLURM on my debian cluster. So I did: http://verahill.blogspot.com.au/2015/07/617-slurm-on-debian-jessie-and.html

My setup is very simple, with each node having it's own working directory that they export via NFS to the main node. Also, I never run jobs across several nodes. Because of that, each node has it's own queue. Not how beowulf clusters were supposed to work, but it's the best solution for me (e.g. ROCKS does the opposite -- exports the user dir from the main node, but that makes reading and writing slow where it counts i.e. on the work nodes).

I've currently got this slurm.conf:

ControlMachine=beryllium
ControlAddr=192.168.1.1
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=2
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdSpoolDir=/var/lib/slurm/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
AccountingStorageType=accounting_storage/filetxt
AccountingStorageLoc=/var/log/slurm/accounting
ClusterName=rupert
JobAcctGatherType=jobacct_gather/none
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
NodeName=beryllium NodeAddr=192.168.1.1
NodeName=neon NodeAddr=192.168.1.120 state=unknown
NodeName=tantalum NodeAddr=192.168.1.150 state=unknown
NodeName=magnesium NodeAddr=192.168.1.200 state=unknown
NodeName=carbon NodeAddr=192.168.1.190 state=unknown
NodeName=oxygen NodeAddr=192.168.1.180 state=unknown
PartitionName=All Nodes=neon,beryllium,tantalum,oxygen,magnesium,carbon default=yes maxtime=infinite state=up
PartitionName=mpi4 Nodes=tantalum maxtime=infinite state=up
PartitionName=mpi12 Nodes=carbon maxtime=infinite state=up
PartitionName=mpi8 Nodes=neon maxtime=infinite state=up
PartitionName=mpix8 Nodes=oxygen maxtime=infinite state=up
PartitionName=mpix12 Nodes=magnesium maxtime=infinite state=up
PartitionName=mpi1 Nodes=beryllium maxtime=infinite state=up

and sinfo returns

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
All* up infinite 6 idle beryllium,carbon,magnesium,neon,oxygen,tantalum
mpi4 up infinite 1 idle tantalum
mpi12 up infinite 1 idle carbon
mpi8 up infinite 1 idle neon
mpix8 up infinite 1 idle oxygen
mpix12 up infinite 1 idle magnesium
mpi1 up infinite 1 idle beryllium


The first step was to figure out what files to edit:

grep -rs "qsub"

apps/siteconfig/QueueManagers:SGE|submitCommand: qsub ##script##

grep -rs "SGE"

apps/scripts/eccejobmonitor: &MsgSendUp("SGE job id '$id' in state '$state'");
[..]
apps/siteconfig/Queues:magnesium|queueMgrName: SGE


Here are the files I edited:

apps/siteconfig/QueueManagers:

12 QueueManagers: LoadLeveler \
13 Maui \
14 EASY \
15 PBS \
16 LSF \
17 Moab \
18 SGE \
19 Shell\
20 Slurm


185 Shell|jobIdParseExpression: \ [0-9]+
186
187 ###############################################################################
188 # SLURM
189 # Simple Linux Utility for Resource Management
190 #
191 #
192 Slurm|submitCommand: sbatch ##script##
193 Slurm|cancelCommand: scancel ##id##
194 Slurm|queryJobCommand: squeue
195 Slurm|queryMachineCommand: sinfo -p ##queue##
196 Slurm|queryQueueCommand: squeue -a
197 Slurm|queryDiskUsageCommand: df -k
198 Slurm|jobIdParseExpression: .*
199 Slurm|jobIdParseLeadingText: job

apps/scripts/eccejobmonitor:

2124 LogMsg "Globus status from eccejobstore: $state";
2125 }
2126 elsif ($q eq 'slurm')
2127 {
2128 $cmd = "squeue 2>&1";
2129 if (open(QUERY, "$cmd |"))
2130 {
2131 $gotState = 0;
2132 while (
)
2133 {
2134 LogMsg "JobCheck: Slurm qstat line: $_";
2135 if (/^\s*$id/)
2136 {
2137 my $state = (split)[5];
2138
2139 &MsgSendUp("Slurm job id '$id' in state '$state'");
2140
2141 if (grep {$state eq $_} qw{R
2142 t})
2143 {
2144 $status = $JOB_STATE_RUNNING;
2145 }
2146 elsif (grep {$state eq $_} qw{PD})
2147 {
2148 $status = $JOB_STATE_PENDING;
2149 }
2150 $gotState = 1;
2151 last;
2152 }
2153 }
2154 if ($gotState == 0)
2155 {
2156 if ($gJobCheckState != $JOB_STATE_NONE)
2157 {
2158 $status = $JOB_STATE_DONE;
2159 }
2160 }
2161 close QUERY;


Next set up a new machine (or queue) using ecce -admin. Set up a queue -- you won't be able to select Slurm, so select e.g. PBS. Edit the apps/siteconfig/CONFIG.machinename file to e.g.

1 NWChem: /opt/nwchem/Nwchem/bin/LINUX64/nwchem
2 Gaussian-03: /opt/gaussian/g09d/g09/g09
3 perlPath: /usr/bin/
4 qmgrPath: /usr/bin/
5 xappsPath: /usr/bin/
6
7 Slurm {
8 #!/bin/csh
9 #SBATCH -p mpi8
10 #SBATCH --time=$walltime
11 #SBATCH --output=slurm.out
12 #SBATCH --job-name=$submitFile
13 }
14
15 NWChemEnvironment {
16 PYTHONPATH /opt/nwchem/Nwchem/contrib/python
17 }
18
19 NWChemFilesToRemove{ core *.aoints.* *.gridpts.* }
20
21 NWChemCommand {
22 setenv PATH "/bin:/usr/bin:/sbin:/usr/sbin"
23 setenv LD_LIBRARY_PATH "/usr/lib/openmpi/lib:/opt/openblas/lib:/opt/acml/acml5.3.1/gfortran64_fma4_int64/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib:/opt/intel/mkl/lib/intel64"
24 hostname
25 mpirun -n $totalprocs /opt/nwchem/Nwchem/bin/LINUX64/nwchem $infile > $outfile
26 }
27
28 Gaussian-03FilesToRemove{ core *.rwf }
29
30 Gaussian-03Command{
31 set path = ( /opt/nbo6/bin $path )
32 setenv GAUSS_SCRDIR /home/me/scratch
33 setenv GAUSS_EXEDIR /opt/gaussian/g09d/g09/bsd:/opt/gaussian/g09d/g09/local:/opt/gaussian/g09d/g09/extras:/opt/gaussian/g09d/g09
34 /opt/gaussian/g09d/g09/g09< $infile > $outfile
35 echo 0
36 }
37
38 Wrapup{
39 dmesg|tail
40 find ~/scratch/* -name "*" -user me|xargs -I {} rm {} -rf
41 }

Next, edit apps/siteconfig/Queues -- in my case the machine I created is called neon-slurm:

neon-slurm|queueMgrName: Slurm
neon-slurm|queueMgrVersion: 2.0~
neon-slurm|prefFile: neon-slurm.Q

And that's all. You should now be able to submit jobs via slurm. There's obviously a lot more than can be done and configured with SLURM, but this was enough to get me up and running, so that I'm now 'future-proofed' in case SGE never comes back into debian stable.

And here's what it looks like when my ecce-submitted jobs are running:

squeue

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
30 mpi8 In_monom me PD 0:00 1 (Resources)
29 mpi8 b_monome me R 16:12 1 neon
31 mpix12 tl_dimer me R 34:28 1 magnesium


Previous
Next Post »