Job Submission

Overview:
SunGridEngine(SGE) is the cluster resource managing system.

Job Limits:
There are 3 main and 2 debug queues for user access available:

    1. daturamon queue – parallel computing queue
      The daturamon.q consists of 4800GB main memory, 2400 (Intel(R) Xeon(R) CPU X5650@2.67GHz) CPU cores, 200 compute nodes, 12cores/compute node, 24GB ram/compute node and is equipped with an Infiniband QDR communication network. The daturamon.q can be used for batch- and interactive jobs.
    2. corrosive queue – parallel computing/debugging/data analysis queue
      The corrosive.q consists of 240GB main memory, 120 (Intel(R) Xeon(R) CPU X5650@2.67GHz) CPU cores, 10 compute nodes, 12cores/compute node, 24GB ram/compute node and is equipped with an Infiniband QDR communication network. The corrosive.q can be used for batch- and interactive jobs.
    3. nxserver queue – remote desktop queue
      The nxserver.q consists of 1584GB main memory, 792 (Intel(R) Xeon(R) Woodcrest 5160@3.00GHz) CPU cores, 198 compute nodes, 4cores/compute node, 8GB ram/compute node and is equipped with an Infiniband DDR communication network. The nxserver.q can be used for batch- and interactive jobs.
    4. gpu queue – GPU/Visualization queue
      The gpu.q consists of 1
      2GB main memory, 4 (Intel(R) Xeon(R) E5504@2.00GHz) CPU cores, 1 compute node, 4 Tesla C2050 cards, 1792 CUDA Cores and 2687MB global memory. The gpu.q can be used for batch- and interactive jobs.
    5. gpu-debug queue – GPU/Visualization queue
      The gpu-debug.q consists of 130GB main memory, 16 (Intel Xeon(R) CPU E5-2670 0 @ 2.60GHz) CPU cores, 1 compute node, 1 NVIDIA Corporation GTX TITAN card with 2688 CUDA Cores, cuda capapbility v3.5 and 6143MB global memory. The gpu-debug.q can be used for batch- and interactive jobs, but should not be used for production runs.

How to run batch jobs:
A submission is a standard Linux shell script that contains a few commands at the beginning that specify directives to the SunGridEngine.

Important SGE directives:

#$ -N JobName Name of the Simualtion
#$ -l h_rt=6:00:00 Walltime:
daturamon.q:max 24:00:00 hours
corrosive.q:max 72:00:00 hours
nxserver.q:max=INFINITY
gpu.q:max=72:00:00 hours
gpu-debug.q:max=72:00:00 hours
#$ -pe daturamon 12
#$ -pe nx 4
#$ -pe gpu 4
#$ -pe gpu 4
#$ -pe corrosive 12
Parallel Environment:
daturamon.q:daturamon
nxserver.q:nx
gpu.q:gpu
gpu-debug.q:gpu
corrosive.q:corrosive
followed by the number of cores
#$ -q daturamon.q
#$ -q nxserver.q 4
#$ -q gpu.q 4
#$ -q gpu-debug 4
#$ -q corrosive 12
Queue name
#$ -M username@aei.mpg.de Email will be sent to this address
#$ -m bae Emails will be sent at the following events:
b-begin, a-abort and e-end
#$ -o /home/username/OUT_$JOB_NAME.$JOB_ID Standard output logfile
#$ -e /home/username/ERR_$JOB_NAME.$JOB_ID Error output logfile

The following is an example SGE submission script.
running on 2124 cores, using OpenMPI, 2 OMP threads and 6 mpi tasks per node on the daturamon.q for 24hours.

#!/bin/csh -f
#$ -N Jobname
#$ -l h_rt=24:00:00
#$ -pe daturamon 2124
#$ -q daturamon.q
#$ -M username@aei.mpg.de
#$ -m bae
#$ -o /home/username/OUT_$JOB_NAME.$JOB_ID
#$ -e /home/username/ERR_$JOB_NAME.$JOB_ID
#$ -R y
# ---------------------------
module purge
module add mpi/openmpi/1.7-gnu

echo "Got $NSLOTS slots."
export OMP_NUM_THREADS=2

mkdir /lustre/datura/username/$JOB_ID

mpirun --mca btl openib,self --mca plm_rsh_num_concurrent 1024 -npernode 6 \ 
/home/username/executable parameterfile

Once a SGE script is created, it needs to be submitted to the resource management system so that it becomes eligible to be run. The command to submit a script to SGE is called qsub. The syntax of qsub is:

qsub submission_script.sh

How to run interactive jobs:
Interactive SGE jobs are similar to batch jobs in that they are submitted to the resource management system. Submitting an interactive SGE job differs from a batch job in that a SGE script is not necessary. All SGE directives can be specified on the command line.

Interactive Jobs on the daturamon queue
qlogin -pe daturamon 12 -q daturamon.q

Interactive Jobs on the nxserver queue:
qlogin -pe nx 4 -q nxserver.q

Interactive Jobs on the gpu queue:
qlogin -pe gpu 4 -q gpu.q