Using the MOAB Workload Manager Article Index Using the MOAB Workload Manager Basic MOAB Script MOAB Serial Job Running an interactive job Running an MPI job MOAB MPICH2 Job MOAB OpenMPI Job MOAB MPICH1 Job Job Dependencies MOAB Queues All Pages This article describes how to use some useful MOAB commands and how to submit different types of jobs to the MOAB scheduler at the FSU HPC. Don't forget to also read the FAQ on Running jobs on the cluster. To use a graphical interface, there is a web portal that can be accessed through your web browser by logging into http://LOGIN-NODE.hpc.fsu.edu (where LOGIN-NODE is one of the Node Host Names under Connecting link on the left. There is an online demo at Cluster Resources to walk you through the Moab Access Portal (MAP). MOAB Command Reference The following is a brief list of the frequently used MOAB commands. For a full reference see the MOAB user guide. showq list the jobs in the current queue msub -q submit a job to the moab batch queue. If no queuename is specificed then the default queue is used. The job submission script is required. Use msub --help to see the full list of options. msub returns the jobid. checkjob [-v] jobid This command allows users to check their job in the event of problems with the job. This command will show you the reason why your job didn't start if it has been deferred or blocked. showstats shows the job submission history statistics. canceljob cancels a job, takes as an argument the jobID. As a last resort, you can also use mjobctl -C jobID or /opt/torque/bin/qdel jobID if canceljob gives an error. showstart gives an estimate of when your job will start to run. Since many people don't indicate the predicated run time of their jobs, this estimate can be completely wrong. showbf show the available system resources See the cluster resources Command Overview page for more information about each individual command. Examples: $ showbf -c genacc_q (show the available resources in the general access queue) $ showq -r -w class=scs_q (show the jobs running in the scs owner queue) $ showq -i -w class=met_q (show idle jobs in the meteorology owner queue) $ showq -r -w qos=coaps_high (show jobs running with QOS coaps_high) Example 1:Basic MOAB Script This job shows a basic example of running a simple job in MOAB and some of the useful variables available to users from PBS. Below is an example MOAB script. In this example, this script is saved as moab.ex1. #!/bin/bash #MOAB -N moab_ex1 echo ------------------------------------------------------ echo -n 'Job is running on node '; cat $PBS_NODEFILE echo ------------------------------------------------------ echo MOAB: qsub is running on $PBS_O_HOST echo MOAB: originating queue is $PBS_O_QUEUE echo MOAB: executing queue is $PBS_QUEUE echo MOAB: working directory is $PBS_O_WORKDIR echo MOAB: execution mode is $PBS_ENVIRONMENT echo MOAB: job identifier is $PBS_JOBID echo MOAB: job name is $PBS_JOBNAME echo MOAB: node file is $PBS_NODEFILE echo MOAB: current home directory is $PBS_O_HOME echo MOAB: PATH = $PBS_O_PATH echo ---------------------------------------------- echo ------------------------------------------------------ echo -n 'Job is running on node '; cat $PBS_NODEFILE echo ------------------------------------------------------ echo ' ' echo ' ' The directive #MOAB -N moab_ex1 names the job for moab and will consequently name the standard output and standard error to the files moab_ex1.o and moab_ex1.e. To join the standard output and the standard error to a single file moab_ex1.o add the directive:#MOAB -j oe. If the name is not specified, the default is for the batch system to create two files, one for standard output and one for standard error named STDIN.o and STDIN.e. The remaining variables are self-explanatory. Below is the output from running this script (saved in a file as moab.ex2) in a users home directory: [jmcdon@scs ~]$ msub moab.ex1 1158 $ cat moab_ex1.o1157 ------------------------------------------------------ Job is running on node hpc-3-1.local ------------------------------------------------------ MOAB: qsub is running on admin.hpc.fsu.edu MOAB: originating queue is default MOAB: executing queue is default MOAB: working directory is /home/jmcdon MOAB: execution mode is PBS_BATCH MOAB: job identifier is 1157.admin.hpc.fsu.edu MOAB: job name is moab_ex1 MOAB: node file is /opt/torque/aux//1157.admin.hpc.fsu.edu MOAB: current home directory is /home/jmcdon MOAB: PATH = /sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin ---------------------------------------------- ------------------------------------------------------ Job is running on node hpc-3-1.local ------------------------------------------------------ Moab parameters There are several moab parameters that you can use in the preamble of your script or as a parameter to msub. If you give them in your script, you have to start the line with "#MOAB", for example "#MOAB -N name". A few useful options are: Option Meaning -N name Declares the name of your job to "name" (and outputfiles). -l nodes=n Requests n nodes for this job -j oe Have the standard output and the standard error write in the same logfile -m abe Have moab mail a notification to you if the job gets aborted, begins or ends. You can use any combination of these letter, for example "-m e" if you are only interested if a jobs finishes. -l walltime=hh:mm:ss Tell the scheduler that your job will run for hh hours, mm minutes and ss seconds. Remember that some queues have a time limit, for example 4 hours for the backfill queue, and that moab will reject your job if you request more time. The default walltime is 14 days for most queues, while the maximum is 90 days. The default and the maximum walltime for the backfill queue is 4 hours. See the FAQ on how to increase the walltime of a running job, up to the maximum walltime. -q queue Specify the queue to run in. -l qos=QOS Specify the quality of service QOS for this job. See the section on the different queues for all possible quality of service levels that you can use. Example 2: Submission of a serial type job. MOAB does not support the notion of a job array such as other batch engines support. For example, only one serial job can be submitted per submit file. To run a serial job under the MOAB system, compile your code as needed and write a file that can be used to execute it. If your program is in the sub-directory your submission script is in, that directory is referenced as the environmental variable $PBS_O_WORKDIR. The executable in this example is called mytest. The MOAB-script would look like: #!/bin/bash #MOAB -j oe #MOAB -l walltime=120:00 cd $PBS_O_WORKDIR ./mytest It is not required that this script be an executable shell script, because MOAB ignores the shell command directive. However, having an executable script is useful for debugging purposes if MOAB jobs fail to start. To start your j ob run: $ msub moab.ex3 1159 The command msub returns the jobID from the submitted job. You can check the status of your job using the checkjob command. $ checkjob 1159 ..... Allocate a single core msub -I Allocate four cores on 1 node msub -l nodes=1:ppn=4 -I Allocate five cores on multiple node msub -l nodes=5 -I Allocate three cores on three node msub -l nodes=3:ppn=1 -I In most cases, you don't want to specify the topology (the exact number of processors per node). If you let MOAB choose the topology for you, your job might run sooner. If there are not enough nodes available to run an interactive job immediately or if the topology you picked is not available, you will have to wait! On the HPC cluster, by default all Message Passing Interface libraries are configured to use the infiniband network to pass data. No special user knowledge or action is required to use the infiniband network. In the next few sections we will show how to run parallel jobs using the three different MPI implementations that are installed on the cluster. The files for these examples can be found in the directory /panfs/storage.local/system/tutorial/example2. The code that we use in these examples, trap.c, is a simple trapezoid integration program. The user is free to choose which MPI library and compiler best suits his or her needs. However, when one compiles a program with a certain library and compiler, it is best to run the program with these tools. In the following examples you can switch compilers by substituting intel for gnu and vice versa. Example 3: Submitting a parallel job with mvapich2 To compile the trapezoid program using the Intel compiler, we first have to make sure it is in our path by executing: $ source /usr/local/profile.d/iccvars.sh $ source /usr/local/profile.d/mpichv2-intel.sh We can check if we have the right compiler by running which mpicc and mpicc -v. To compile our program we run: $ mpicc -o trap-mpichv2 trap.c -lm Note: The above command may produce the message: "warning: feupdateenv is not implemented and will always fail." This is normal and no cause for concern unless you know for certain that your compilation requires the use of this function. To submit the program to a batch system, you must create a startup script with the appropriate topology. Here is an example that requests 8 nodes, thus a total of 8 processes. The host file that mvampich2 uses must have the number of processors to use as host:N, where N is the number of processes. The mpdboot processes are started at one per node and the mpirun should start jobs at N per node. The argument to mpdboot must match the number of nodes and the argument to mpirun must match the number of nodes times the number of processes per node. 1 #!/bin/bash 2 3 #MOAB -l nodes=8 4 #MOAB -j oe 5 #MOAB -l walltime=60:00 6 7 source /usr/local/profile.d/mpichv2-intel.sh 8 9 mpirun $PBS_O_WORKDIR/trap-mpich2 This script first sets up the right environment for the mpichv2 paths for the Intel compiler and then the mpirun program starts the executable trap-mpich2 on 8 nodes. In this example we have set the walltime to 1 hour (60 minutes). Although it's not imperative to set the walltime property, it does make it easier for the scheduler to schedule your job. Be sure not under-estimate the walltime, but over-estimate a bit. If the file is saved as trap-mpichv2.sh, the job can be executed with: $ msub trap-mpichv2.sh The above script makes the moab job submission very flexible. The number of processes that you request will be obtained not by the topology requested but by the nodes that have free cpus. For example, a 128 node job may end up working on 64 nodes with 2 free CPUs or 128 nodes with 1 free CPU or some linear combination of this. This example script can be found on the cluster in /panfs/storage.local/system/tutorial/example2/trap-mpichv2.sh. Example 4: Submitting OpenMPI jobs OpenMPI is more flexible than mvapich 2 with the nodes file and can directly use the PBS_NODEFILE given by the job scheduler. The setup of the GNU version of OpenMPI is described here. Before we compile our example program, we have to make sure that we run the right compiler: $ source /usr/local/profile.d/openmpi-gnu.sh We can then compile our trap.c file like: mpicc -o trap-openmpi trap.c to obtain the OpenMPI executable trap-openmpi. This can now be run with the MOAB script below: #!/bin/bash #MOAB -l nodes=4 #MOAB -j oe #MOAB -l walltime=120:00 #MOAB -N TRAP-OPENMPI source /usr/local/profile.d/openmpi-gnu.sh mpirun $PBS_O_WORKDIR/trap-openmpi Save the moab script as trap-openmpi.sh. Submit the job to MOAB using: $ msub trap-openmpi.sh Once the job has completed, you should receive an output file: TRAP-OPENMPI.o which looks like: Number of procs 4 With n = 1024 trapezoids, our estimate of the integral from 0.000000 to 1.000000 = 1.000000 This example was done using the gcc version of OpenMPI. There is also a version of Intel OpenMPI available. The setup of OpenMPI Intel is described here. Example 5: Submitting a parallel job with mvapich1 The mvapich implementation of mpich version 1 is installed on the HPC cluster. This version seamlessly integrates the communication layers for the infiniband fabric for the users. It is one of the easiest implementations to use. However, it suffers in both robustness and ease of cleanup. The use of mpich version 2 or openmpi is strongly encouraged above this version. To compile our test program with the gnu compiler, we run the following commands: $ source /usr/local/profile.d/mpichv1-gnu.sh $ mpicc -o trap-mpichv1 trap.c -lm Using the following script, we can submit our job: #!/bin/bash #MOAB -l nodes=8 #MOAB -j oe #MOAB -N TRAP-MPICHV1 #MOAB -l walltime=120:00 source /usr/local/profile.d/mpichv1-gnu.sh /usr/mpi/gcc/mvapich/bin/mpirun $PBS_O_WORKDIR/trap-mpichv1 When save in the file mpichv1.sh, we can submit our program using msub mpichv1.sh Job Dependencies One can indicate a dependency between two or more jobs by using the -W x=depend:dependency-type:jobid flag for the msub commando. For example to tell moab that job2.sh can only run after job1.sh successfully finishes, one can use: $ msub job1.sh 12345 $ msub -W x=depend:afterok:12345 job2.sh 12346 This will queue job1.sh with a jobid of 12345, queue job2.sh with jobid 12346 and indicate moab that job2.sh has to wait until job1.sh has finished. Dependency types Dependency Format Description after after:[:]... Job may start at any time after specified jobs have started execution. afterany afterany:[:]... Job may start at any time after all specified jobs have completed regardless of completion status. afterok afterok:[:]... Job may be start at any time after all specified jobs have successfully completed. afternotok afternotok:[:]... Job may start at any time after any specified jobs have completed unsuccessfully. before before:[:]... Job may start at any time before specified jobs have started execution. beforeany beforeany:[:]... Job may start at any time before all specified jobs have completed regardless of completion status. beforeok beforeok:[:]... Job may start at any time before all specified jobs have successfully completed. beforenotok beforenotok:[:]... Job may start at any time before any specified jobs have completed unsuccessfully. on on: Job may start after dependencies on other jobs have been satisfied. synccount synccount: Job must start at the same time as other jobs that reference this job using the syncwith keyword. syncwith syncwith: Job must start at the same time as . MOAB Queues There are a number of job queues available on the cluster and depending on your user type (owner-base or general access) you can submit your job to one of those. For an actual list of queues, see this list. Time limitations Jobs submitted to the backfill queue have a wallclock time restriction: their runtime can not exceed 4 hours. When that time limit is exceeded, your job will be cancelled. If you try to reserve a longer walltime than 4 hours, the scheduler will refuse your job. Examples Run a job in the backfill queue: $ msub -q backfill trap-mpichv2.sh Run a job with high-priority: $ msub -l qos=scs_high trap-mpichv2.sh Change the quality of service from a job that is waiting $ checkjob 23831 ... Creds: user:paulvdm group:paulvdm class:scs_q qos:scs_high ... $ setqos med 23831 $ checkjob 23831 ... Creds: user:paulvdm group:paulvdm class:scs_q qos:med ... For more information, see the slides of from the tech series lectures.