We maintain a system with 40 CPU cores containing some General Purpose GPUs (GPGPU). Current hardware:
|2||Nvidia Tesla K40c||12|
|4||Nvidia Tesla K40m||12|
The primary purpose for the GPUs is for coursework related to high performance computing/parallel programming. We are also making them available for research purposes. Jobs submitted for courses will take precedence.
Log in to
For research purposes, jobs are to be submitted from here.
System packages of OpenMPI and CUDA are being used as of January 2019. You may use these without altering your environment.
Some high-performance computing softwares (such as compilers, MPI implementations, etc.) are installed elsewhere.
modulecommand to inspect what is available. Load with
module load. For example, to get the PGI compiler, do:
module load pgi
We use SLURM workload manager/job queueing system. The previous link points to the SLURM quick start documentation. From there you can access all other documentation on SLURM.
research partition for all jobs.
This allows the jobs to be preempted by academic jobs, which will often be operating under strict deadlines.
We are looking into various options on how to deal with overcommitted resources, such as job suspension and requeuing. For now: your long-running jobs may be killed to make room for new academic jobs. This would only happen if there aren’t enough resources to run the academic jobs if/when they are submitted.
GPUs are exposed by SLURM as a General Resource. Example run (using 2 GPUs):
srun -p research --gres=gpu:2 ./myjob
See SLURM documentation on General Resources for more.
cuda0 has 2x Intel Xeon E5-2630 CPUs clocked at 2.20GHz.
Each CPU contains 10 cores, each of which is dual threaded.
2x10x2=40 logical cores of computation possible for CPU-level parallelism.
«mpi» is the name of the compiled program. Here we are requesting 4 processes with the
catan2% srun -p research -n 4 mpi Hello world from processor cuda-hw2, rank 2 out of 4 processors Hello world from processor cuda-hw2, rank 0 out of 4 processors Hello world from processor cuda-hw2, rank 1 out of 4 processors Hello world from processor cuda-hw2, rank 3 out of 4 processors
«omp» is the name of the compiled program. Here we are requesting 4 threads with the
catan2% srun -p research -c 4 omp Hello from thread 1 Hello from thread 3 Hello from thread 0 Hello from thread 2
Hybrid applications are finally working!
They must (for now, at least) be compiled with either the system-level OpenMPI or the OpenMPI 3.0.0 from module
Here’s what happens (asking for 2 processes
-n 2 each with 2 threads
-c 2 with the toy hybrid program
cuda-hw2:~/parallel_test$ module load openmpi cuda-hw2:~/parallel_test$ module list Currently Loaded Modulefiles: 1) openmpi/gcc cuda-hw2:~/parallel_test$ mpicc -fopenmp ompi.c -o ompi cuda-hw2% srun -p research -n 2 -c 2 ./ompi Hello from thread 1 out of 2 from process 2 out of 2 on cuda-hw2 Hello from thread 2 out of 2 from process 2 out of 2 on cuda-hw2 Hello from thread 1 out of 2 from process 1 out of 2 on cuda-hw2 Hello from thread 2 out of 2 from process 1 out of 2 on cuda-hw2