Job Schedular¶

Our platform use SLURM as the job schedular.

SLURM (Simple Linux Utility for Resources Management) is an open-source job scheduler used in High-Performance Computing (HPC) environments. Its primary purpose is to efficiently manage and allocate computing resources for parallel and distributed computing tasks. Here are the key functions of SLURM:

Resource Allocation

SLURM provides exclusive and/or non-exclusive access to the resources on compute nodes for a specified amount of time.
Users submit jobs to SLURM, which then schedules these jobs to run on available compute nodes.
When you request resources (such as CPU cores, memory, or GPUs) for your job, SLURM ensures that these resources are allocated exclusively to your job during its execution.

Job Execution Framework

SLURM acts as a framework for starting, executing, and monitoring work on the allocated compute nodes.
Users submit job scripts (batch files) to SLURM, specifying resource requirements and the commands to be executed.
SLURM manages the execution of these jobs, ensuring they run efficiently and without conflicts.

There are some examples to guide on using SLURM for the job management.