GPU/Compute Server

Overview

We maintain a system cuda-hw2 with 40 CPU cores containing some General Purpose GPUs (GPGPU). Current hardware:

Qty	GPU	Memory (GB)
2	Nvidia Tesla K40c	12
4	Nvidia Tesla K40m	12

The primary purpose for the GPUs is for coursework related to high performance computing/parallel programming. We are also making them available for research purposes. Jobs submitted for courses will take precedence.

We use SLURM workload manager/job queueing system. The previous link points to the SLURM quick start documentation. From there you can access all other documentation on SLURM.

Use the correct SLURM partition based on your use-case (see below, under Submitting Jobs).

On our systems the bash shell is configured via a global alias for srun to automatically select the correct partition

Users of shells other than bash, or those submitting jobs with sbatch, must make sure to specify the correct partition in their job submissions.

Academic Users

Readying Jobs

You will log in to a virtual machine tailored to your course’s needs. The following table enumerates the VMs available at the time of this writing (others may be added):

Course	Hostname	SLURM partition
CISC360	cisc360.acad.cis.udel.edu	cisc360
CISC372	cisc372.cis.udel.edu	cisc372
CPEG 455/655	cpeg655.ece.udel.edu	cpeg655
CPEG 652	cpeg652.ece.udel.edu	cpeg652

Submitting Jobs

Use the SLURM partition from the above table corresponding to your course. Again, this is the default behavior of srun if you are using the bash shell.

cpeg655:~$ alias srun
alias srun='srun -p cpeg655'

Research Users

Readying Jobs

Submitting Jobs

Use the research partition for all jobs. This allows the jobs to be preempted by academic jobs, which will often be operating under strict deadlines.

We are looking into various options on how to deal with overcommitted resources, such as job suspension and requeuing. For now: your long-running jobs may be killed to make room for new academic jobs. This will only happen if there aren’t enough resources to run the academic jobs when they are submitted.

Software

System packages of MPI and CUDA are being used as of January 2019. You may use these without altering your environment.

Other high-performance computing softwares (such as compilers, MPI implementations, etc.) may be available, and/or installed upon request.

You can use the module command to inspect what is currently available. Load with module load.

Examples

Requesting GPUs

GPUs are exposed by SLURM as a General Resource. Example run (using 2 GPUs):

srun --gres=gpu:2 ./myjob

See SLURM documentation on General Resources for more.

Parallel CPU codes

cuda-hw2 has 2x Intel Xeon E5-2630 CPUs clocked at 2.20GHz. Each CPU contains 10 cores, each of which is dual threaded. 2x10x2=40 logical cores of computation possible for CPU-level parallelism.

In Fall 2020, we discovered that a few of the cores went bad. There are now 36 logical cores of computation available. SLURM is now configured to ignore the bad cores, but note that you may now only request up to 36 cores of execution.

Multiprocessing (e.g. MPI)

Here we are requesting 4 processes with the -n flag:

% srun -n 4 mpi_program
Hello world from processor cuda-hw2, rank 2 out of 4 processors
Hello world from processor cuda-hw2, rank 0 out of 4 processors
Hello world from processor cuda-hw2, rank 1 out of 4 processors
Hello world from processor cuda-hw2, rank 3 out of 4 processors

Multithreading (e.g. OpenMP)

Here we are requesting 4 threads with the -c flag:

% srun -c 4 omp_program
  Hello from thread 1
  Hello from thread 3
  Hello from thread 0
  Hello from thread 2

Hybrid applications (e.g. both OpenMP & MPI)

Here we are requesting 2 processes -n 2, each with two threads -c 2:

% mpicc -fopenmp ompi.c -o ompi
% srun -n 2 -c 2 ./ompi
Hello from thread 1 out of 2 from process 2 out of 2 on cuda-hw2
Hello from thread 2 out of 2 from process 2 out of 2 on cuda-hw2
Hello from thread 1 out of 2 from process 1 out of 2 on cuda-hw2
Hello from thread 2 out of 2 from process 1 out of 2 on cuda-hw2

Table of Contents

Overview

Academic Users

Readying Jobs

Submitting Jobs

Research Users

Readying Jobs

Submitting Jobs

Software

Examples

Requesting GPUs

Parallel CPU codes

Multiprocessing (e.g. MPI)

Multithreading (e.g. OpenMP)

Hybrid applications (e.g. both OpenMP & MPI)

EXTERNAL

ECE/CIS

QUICK