gres.conf - Slurm configuration file for Generic RESource (GRES) management.
gres.conf is an ASCII file which describes the configuration of Generic
RESource (GRES) on each compute node. If the GRES information in the
slurm.conf file does not fully describe those resources, then a gres.conf file
should be included on each compute node. The file location can be modified at
system build time using the DEFAULT_SLURM_CONF parameter or at execution time
by setting the SLURM_CONF environment variable. The file will always be
located in the same directory as the slurm.conf file.
If the GRES information in the slurm.conf file fully describes
those resources (i.e. no "Cores", "File" or
"Links" specification is required for that GRES type or that
information is automatically detected), that information may be omitted from
the gres.conf file and only the configuration information in the slurm.conf
file will be used. The gres.conf file may be omitted completely if the
configuration information in the slurm.conf file fully describes all
GRES.
If using the gres.conf file to describe the resources
available to nodes, the first parameter on the line should be
NodeName. If configuring Generic Resources without specifying nodes,
the first parameter on the line should be Name.
Parameter names are case insensitive. Any text following a
"#" in the configuration file is treated as a comment through the
end of that line. Changes to the configuration file take effect upon restart
of Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
command "scontrol reconfigure" unless otherwise noted.
NOTE: Slurm support for gres/mps requires the use of the
select/cons_tres plugin. For more information on how to configure MPS, see
https://slurm.schedmd.com/gres.html#MPS_Management.
For more information on GRES scheduling in general, see
https://slurm.schedmd.com/gres.html.
The overall configuration parameters available include:
- AutoDetect
- The hardware detection mechanisms to enable for automatic GRES
configuration. This should be on a line by itself. Current, options
are:
- nvml
- Used to automatically detect NVIDIA GPUs
- rsmi
- Used to automatically detect AMD GPUs
- Count
- Number of resources of this type available on this node. The default value
is set to the number of File values specified (if any), otherwise
the default value is one. A suffix of "K", "M",
"G", "T" or "P" may be used to multiply the
number by 1024, 1048576, 1073741824, etc. respectively. For example:
"Count=10G".
- Cores
- Optionally specify the core index numbers for the specific cores which can
use this resource. For example, it may be strongly preferable to use
specific cores with specific GRES devices (e.g. on a NUMA architecture).
While Slurm can track and assign resources at the CPU or thread level, its
scheduling algorithms used to co-allocate GRES devices with CPUs operates
at a socket or NUMA level. Therefore it is not possible to preferentially
assign GRES with different specific CPUs on the same NUMA or socket and
this option should be used to identify all cores on some socket.
Multiple cores may be specified using a comma delimited list
or a range may be specified using a "-" separator (e.g.
"0,1,2,3" or "0-3"). If a job specifies
--gres-flags=enforce-binding, then only the identified cores can
be allocated with each generic resource. This will tend to improve
performance of jobs, but delay the allocation of resources to them. If
specified and a job is not submitted with the
--gres-flags=enforce-binding option the identified cores will be
preferred for scheduling with each generic resource.
If --gres-flags=disable-binding is specified, then any
core can be used with the resources, which also increases the speed of
Slurm's scheduling algorithm but can degrade the application
performance. The --gres-flags=disable-binding option is currently
required to use more CPUs than are bound to a GRES (i.e. if a GPU is
bound to the CPUs on one socket, but resources on more than one socket
are required to run the job). If any core can be effectively used with
the resources, then do not specify the cores option for improved
speed in the Slurm scheduling logic. A restart of the slurmctld is
needed for changes to the Cores option to take effect.
NOTE: Since Slurm must be able to perform resource
management on heterogeneous clusters having various processing unit
numbering schemes, a logical core index must be specified instead of the
physical core index. That logical core index might not correspond to
your physical core index number. Core 0 will be the first core on the
first socket, while core 1 will be the second core on the first socket.
This numbering coincides with the logical core number (Core L#) seen in
"lstopo -l" command output.
- File
- Fully qualified pathname of the device files associated with a resource.
The name can include a numeric range suffix to be interpreted by Slurm
(e.g. File=/dev/nvidia[0-3]).
This field is generally required if enforcement of generic
resource allocations is to be supported (i.e. prevents users from making
use of resources allocated to a different user). Enforcement of the file
allocation relies upon Linux Control Groups (cgroups) and Slurm's
task/cgroup plugin, which will place the allocated files into the job's
cgroup and prevent use of other files. Please see Slurm's Cgroups Guide
for more information: https://slurm.schedmd.com/cgroups.html.
If File is specified then Count must be either
set to the number of file names specified or not set (the default value
is the number of files specified). The exception to this is MPS. For
MPS, each GPU would be identified by device file using the File
parameter and Count would specify the number of MPS entries that
would correspond to that GPU (typically 100 or some multiple of
100).
NOTE: If you specify the File parameter for a resource
on some node, the option must be specified on all nodes and Slurm will
track the assignment of each specific resource on each node. Otherwise
Slurm will only track a count of allocated resources rather than the
state of each individual device file.
NOTE: Drain a node before changing the count of records with
File parameters (i.e. if you want to add or remove GPUs from a
node's configuration). Failure to do so will result in any job using
those GRES being aborted.
- Flags
- Optional flags that can be specified to change configured behavior of the
GRES.
Allowed values at present are:
- CountOnly
- Do not attempt to load plugin as this GRES will only be used to track
counts of GRES used. This avoids attempting to load non-existent plugin
which can affect filesystems with high latency metadata operations for
non-existent files.
- Links
- A comma-delimited list of numbers identifying the number of connections
between this device and other devices to allow coscheduling of better
connected devices. This is an ordered list in which the number of
connections this specific device has to device number 0 would be in the
first position, the number of connections it has to device number 1 in the
second position, etc. A -1 indicates the device itself and a 0 indicates
no connection. If specified, then this line can only contain a single GRES
device (i.e. can only contain a single file via File).
This is an optional value and is usually automatically
determined if AutoDetect is enabled. A typical use case would be
to identify GPUs having NVLink connectivity. Note that for GPUs, the
minor number assigned by the OS and used in the device file (i.e. the X
in /dev/nvidiaX) is not necessarily the same as the device
number/index. The device number is created by sorting the GPUs by PCI
bus ID and then numbering them starting from the smallest bus ID. See
https://slurm.schedmd.com/gres.html#GPU_Management
- Name
- Name of the generic resource. Any desired name may be used. The name must
match a value in GresTypes in slurm.conf. Each generic
resource has an optional plugin which can provide resource-specific
functionality. Generic resources that currently include an optional plugin
are:
- gpu
- Graphics Processing Unit
- mps
- CUDA Multi-Process Service (MPS)
- nic
- Network Interface Card
- mic
- Intel Many Integrated Core (MIC) processor
- NodeName
- An optional NodeName specification can be used to permit one gres.conf
file to be used for all compute nodes in a cluster by specifying the
node(s) that each line should apply to. The NodeName specification can use
a Slurm hostlist specification as shown in the example below.
- Type
- An optional arbitrary string identifying the type of device. For example,
this might be used to identify a specific model of GPU, which users can
then specify in a job request. If Type is specified, then
Count is limited in size (currently 1024).
##################################################################
# Slurm's Generic Resource (GRES) configuration file
# Define GPU devices with MPS support
##################################################################
AutoDetect=nvml
Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
Name=gpu Type=tesla File=/dev/nvidia1 COREs=2,3
Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
Name=mps Count=100 File=/dev/nvidia1 COREs=2,3
##################################################################
# Slurm's Generic Resource (GRES) configuration file
# Overwrite system defaults and explicitly configure three GPUs
##################################################################
Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
# Name=gpu Type=tesla File=/dev/nvidia[2-3] COREs=2,3
# NOTE: nvidia2 device is out of service
Name=gpu Type=tesla File=/dev/nvidia3 COREs=2,3
##################################################################
# Slurm's Generic Resource (GRES) configuration file
# Use a single gres.conf file for all compute nodes - positive method
##################################################################
## Explicitly specify devices on nodes tux0-tux15
# NodeName=tux[0-15] Name=gpu File=/dev/nvidia[0-3]
# NOTE: tux3 nvidia1 device is out of service
NodeName=tux[0-2] Name=gpu File=/dev/nvidia[0-3]
NodeName=tux3 Name=gpu File=/dev/nvidia[0,2-3]
NodeName=tux[4-15] Name=gpu File=/dev/nvidia[0-3]
##################################################################
# Slurm's Generic Resource (GRES) configuration file
# Use NVML to gather GPU configuration information
# Information about all other GRES gathered from slurm.conf
##################################################################
AutoDetect=nvml
Copyright (C) 2010 The Regents of the University of California. Produced at
Lawrence Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2010-2019 SchedMD LLC.
This file is part of Slurm, a resource management program. For
details, see <https://slurm.schedmd.com/>.
Slurm is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
Slurm is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.