|
|
| |
sacctmgr(1) |
Slurm Commands |
sacctmgr(1) |
sacctmgr - Used to view and modify Slurm account information.
sacctmgr [OPTIONS...] [COMMAND...]
sacctmgr is used to view or modify Slurm account information. The account
information is maintained within a database with the interface being provided
by slurmdbd (Slurm Database daemon). This database can serve as a
central storehouse of user and computer information for multiple computers at
a single site. Slurm account information is recorded based upon four
parameters that form what is referred to as an association. These
parameters are user, cluster, partition, and
account. user is the login name. cluster is the name of a
Slurm managed cluster as specified by the ClusterName parameter in the
slurm.conf configuration file. partition is the name of a Slurm
partition on that cluster. account is the bank account for a job. The
intended mode of operation is to initiate the sacctmgr command, add,
delete, modify, and/or list association records then commit the changes
and exit.
- Note: The content's of Slurm's database are maintained in lower
case. This may
- result in some sacctmgr output differing from that of other Slurm
commands.
- -h, --help
- Print a help message describing the usage of sacctmgr. This is
equivalent to the help command.
- -i, --immediate
- commit changes immediately without asking for confirmation.
- -n, --noheader
- No header will be added to the beginning of the output.
- -p, --parsable
- Output will be '|' delimited with a '|' at the end.
- -P, --parsable2
- Output will be '|' delimited without a '|' at the end.
- -Q, --quiet
- Print no messages other than error messages. This is equivalent to the
quiet command.
- -r, --readonly
- Makes it so the running sacctmgr cannot modify accounting information. The
readonly option is for use within interactive mode.
- -s, --associations
- Use with show or list to display associations with the entity. This is
equivalent to the associations command.
- -v, --verbose
- Enable detailed logging. This is equivalent to the verbose command.
- -V , --version
- Display version number. This is equivalent to the version command.
- add <ENTITY> <SPECS>
- Add an entity. Identical to the create command.
- archive {dump|load} <SPECS>
- Write database information to a flat file or load information that has
previously been written to a file.
- clear stats
- Clear the server statistics.
- create <ENTITY> <SPECS>
- Add an entity. Identical to the add command.
- delete <ENTITY> where <SPECS>
- Delete the specified entities. Identical to the remove command.
- dump <ENTITY> [File=FILENAME]
- Dump cluster data to the specified file. If the filename is not specified
it uses clustername.cfg filename by default.
- help
- Display a description of sacctmgr options and commands.
- list <ENTITY> [<SPECS>]
- Display information about the specified entity. By default, all entries
are displayed, you can narrow results by specifying SPECS in your query.
Identical to the show command.
- load <FILENAME>
- Load cluster data from the specified file. This is a configuration file
generated by running the sacctmgr dump command. This command does not load
archive data, see the sacctmgr archive load option instead.
- modify <ENTITY> where <SPECS>
set <SPECS>
- Modify an entity.
- reconfigure
- Reconfigures the SlurmDBD if running with one.
- remove <ENTITY> where <SPECS>
- Delete the specified entities. Identical to the delete command.
- show <ENTITY> [<SPECS>]
- Display information about the specified entity. By default, all entries
are displayed, you can narrow results by specifying SPECS in your query.
Identical to the list command.
- shutdown
- Shutdown the server.
- version
- Display the version number of sacctmgr.
NOTE: All commands listed below can be used in the interactive mode, but
NOT on the initial command line.
- exit
- Terminate sacctmgr interactive mode. Identical to the quit command.
- quiet
- Print no messages other than error messages.
- quit
- Terminate the execution of sacctmgr interactive mode. Identical to the
exit command.
- verbose
- Enable detailed logging. This includes time-stamps on data structures,
record counts, etc. This is an independent command with no options meant
for use in interactive mode.
- !!
- Repeat the last command.
- account
- A bank account, typically specified at job submit time using the
--account= option. These may be arranged in a hierarchical fashion,
for example accounts 'chemistry' and 'physics' may be children of the
account 'science'. The hierarchy may have an arbitrary depth.
- association
- The entity used to group information consisting of four parameters:
account, cluster, partition (optional), and
user. Used only with the list or show command. Add,
modify, and delete should be done to a user, account or cluster entity.
This will in turn update the underlying associations.
- cluster
- The ClusterName parameter in the slurm.conf configuration
file, used to differentiate accounts on different machines.
- configuration
- Used only with the list or show command to report current
system configuration.
- coordinator
- A special privileged user, usually an account manager, that can add users
or sub-accounts to the account they are coordinator over. This should be a
trusted person since they can change limits on account and user
associations, as well as cancel, requeue or reassign accounts of jobs
inside their realm.
- event
- Events like downed or draining nodes on clusters.
- federation
- A group of clusters that work together to schedule jobs.
- job
- Used to modify specific fields of a job: Derived Exit Code, the Comment
String, or wckey.
- problem
- Use with show or list to display entity problems.
- qos
- Quality of Service.
- reservation
- A collection of resources set apart for use by a particular account, user
or group of users for a given period of time.
- resource
- Software resources for the system. Those are software licenses shared
among clusters.
- RunawayJobs
- Used only with the list or show command to report current
jobs that have been orphaned on the local cluster and are now runaway. If
there are jobs in this state it will also give you an option to
"fix" them. NOTE: You must have an AdminLevel of
at least Operator to perform this.
- stats
- Used with list or show command to view server statistics.
Accepts optional argument of ave_time or total_time to sort
on those fields. By default, sorts on increasing RPC count field.
- transaction
- List of transactions that have occurred during a given time period.
- tres
- Used with list or show command to view a list of Trackable
RESources configured on the system.
- user
- The login name. Usernames are case-insensitive (forced to lowercase)
unless the PreserveCaseUser option has been set in the SlurmDBD
configuration file.
- wckeys
- Workload Characterization Key. An arbitrary string for grouping orthogonal
accounts.
NOTE: The group limits (GrpJobs, GrpTRES, etc.) are tested when a job is
being considered for being allocated resources. If starting a job would cause
any of its group limit to be exceeded, that job will not be considered for
scheduling even if that job might preempt other jobs which would release
sufficient group resources for the pending job to be initiated.
- DefaultQOS=<default qos>
- The default QOS this association and its children should have. This is
overridden if set directly on a user. To clear a previously set value use
the modify command with a new value of -1.
- Fairshare=<fairshare number | parent>
- Number used in conjunction with other accounts to determine job priority.
Can also be the string parent, when used on a user this means that
the parent association is used for fairshare. If Fairshare=parent is set
on an account, that account's children will be effectively reparented for
fairshare calculations to the first parent of their parent that is not
Fairshare=parent. Limits remain the same, only its fairshare value is
affected. To clear a previously set value use the modify command with a
new value of -1.
- GrpTRESMins=<TRES=max TRES minutes,...>
- The total number of TRES minutes that can possibly be used by past,
present and future jobs running from this association and its children. To
clear a previously set value use the modify command with a new value of -1
for each TRES id.
NOTE: This limit is not enforced if set on the root
association of a cluster. So even though it may appear in sacctmgr
output, it will not be enforced.
ALSO NOTE: This limit only applies when using the Priority
Multifactor plugin. The time is decayed using the value of
PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the
slurm.conf. When this limit is reached all associated jobs running will
be killed and all future jobs submitted with associations in the group
will be delayed until they are able to run inside the limit.
- GrpTRESRunMins=<TRES=max TRES run minutes,...>
- Used to limit the combined total number of TRES minutes used by all jobs
running with this association and its children. This takes into
consideration time limit of running jobs and consumes it, if the limit is
reached no new jobs are started until other jobs finish to allow time to
free up.
- GrpTRES=<TRES=max TRES,...>
- Maximum number of TRES running jobs are able to be allocated in aggregate
for this association and all associations which are children of this
association. To clear a previously set value use the modify command with a
new value of -1 for each TRES id.
NOTE: This limit only applies fully when using the Select
Consumable Resource plugin.
- GrpJobs=<max jobs>
- Maximum number of running jobs in aggregate for this association and all
associations which are children of this association. To clear a previously
set value use the modify command with a new value of -1.
- GrpJobsAccrue=<max jobs>
- Maximum number of pending jobs in aggregate able to accrue age priority
for this association and all associations which are children of this
association. To clear a previously set value use the modify command with a
new value of -1.
- GrpSubmitJobs=<max jobs>
- Maximum number of jobs which can be in a pending or running state at any
time in aggregate for this association and all associations which are
children of this association. To clear a previously set value use the
modify command with a new value of -1.
NOTE: This setting shows up in the sacctmgr output as
GrpSubmit.
- GrpWall=<max wall>
- Maximum wall clock time running jobs are able to be allocated in aggregate
for this association and all associations which are children of this
association. To clear a previously set value use the modify command with a
new value of -1.
NOTE: This limit is not enforced if set on the root
association of a cluster. So even though it may appear in sacctmgr
output, it will not be enforced.
ALSO NOTE: This limit only applies when using the Priority
Multifactor plugin. The time is decayed using the value of
PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the
slurm.conf. When this limit is reached all associated jobs running will
be killed and all future jobs submitted with associations in the group
will be delayed until they are able to run inside the limit.
- MaxTRESMinsPerJob=<max TRES minutes>
- Maximum number of TRES minutes each job is able to use in this
association. This is overridden if set directly on a user. Default is the
cluster's limit. To clear a previously set value use the modify command
with a new value of -1 for each TRES id.
NOTE: This setting shows up in the sacctmgr output as
MaxTRESMins.
- MaxTRESPerJob=<max TRES>
- Maximum number of TRES each job is able to use in this association. This
is overridden if set directly on a user. Default is the cluster's limit.
To clear a previously set value use the modify command with a new value of
-1 for each TRES id.
NOTE: This setting shows up in the sacctmgr output as
MaxTRES.
NOTE: This limit only applies fully when using
cons_res or cons_tres select type plugins.
- MaxJobs=<max jobs>
- Maximum number of jobs each user is allowed to run at one time in this
association. This is overridden if set directly on a user. Default is the
cluster's limit. To clear a previously set value use the modify command
with a new value of -1.
- MaxJobsAccrue=<max jobs>
- Maximum number of pending jobs able to accrue age priority at any given
time for the given association. This is overridden if set directly on a
user. Default is the cluster's limit. To clear a previously set value use
the modify command with a new value of -1.
- MaxSubmitJobs=<max jobs>
- Maximum number of jobs which can this association can have in a pending or
running state at any time. Default is the cluster's limit. To clear a
previously set value use the modify command with a new value of -1.
NOTE: This setting shows up in the sacctmgr output as
MaxSubmit.
- MaxWallDurationPerJob=<max wall>
- Maximum wall clock time each job is able to use in this association. This
is overridden if set directly on a user. Default is the cluster's limit.
<max wall> format is <min> or <min>:<sec> or
<hr>:<min>:<sec> or
<days>-<hr>:<min>:<sec> or
<days>-<hr>. The value is recorded in minutes with rounding as
needed. To clear a previously set value use the modify command with a new
value of -1.
NOTE: Changing this value will have no effect on any
running or pending job.
NOTE: This setting shows up in the sacctmgr output as
MaxWall.
- Priority
- What priority will be added to a job´s priority when using this
association. This is overridden if set directly on a user. Default is the
cluster's limit. To clear a previously set value use the modify command
with a new value of -1.
- QosLevel<operator><comma separated list of qos
names>
- Specify the default Quality of Service's that jobs are able to run at for
this association. To get a list of valid QOS's use 'sacctmgr list qos'.
This value will override its parents value and push down to its children
as the new default. Setting a QosLevel to '' (two single quotes with
nothing between them) restores its default setting. You can also use the
operator += and -= to add or remove certain QOS's from a QOS list.
Valid <operator> values include:
- =
- Set QosLevel to the specified value. Note: the QOS that can
be used at a given account in the hierarchy are inherited by the children
of that account. By assigning QOS with the = sign only the assigned
QOS can be used by the account and its children.
- +=
- Add the specified <qos> value to the current QosLevel. The
account will have access to this QOS and the other previously assigned to
it.
- -=
- Remove the specified <qos> value from the current
QosLevel.
- See the EXAMPLES section below.
-
- Cluster=<cluster>
- Specific cluster to add account to. Default is all in system.
- Description=<description>
- An arbitrary string describing an account.
- Name=<name>
- The name of a bank account. Note the name must be unique and can not be
represent different bank accounts at different points in the account
hierarchy.
- Organization=<org>
- Organization to which the account belongs.
- Parent=<parent>
- Parent account of this account. Default is the root account, a top level
account.
- RawUsage=<value>
- This allows an administrator to reset the raw usage accrued to an account.
The only value currently supported is 0 (zero). This is a settable
specification only - it cannot be used as a filter to list accounts.
- WithAssoc
- Display all associations for this account.
- WithCoord
- Display all coordinators for this account.
- WithDeleted
- Display information with previously deleted data.
NOTE: If using the WithAssoc option you can also query against
association specific information to view only certain associations this
account may have. These extra options can be found in the SPECIFICATIONS
FOR ASSOCIATIONS section. You can also use the general specifications
list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED
ENTITIES section.
- Account
- The name of a bank account.
- Description
- An arbitrary string describing an account.
- Organization
- Organization to which the account belongs.
- Coordinators
- List of users that are a coordinator of the account. (Only filled in when
using the WithCoordinator option.)
NOTE: If using the WithAssoc option you can also view the
information about the various associations the account may have on all the
clusters in the system. The association information can be filtered. Note
that all the accounts in the database will always be shown as filter only
takes effect over the association data. The Association format fields are
described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
- Clusters=<comma separated list of cluster names>
- List the associations of the cluster(s).
- Accounts=<comma separated list of account names>
- List the associations of the account(s).
- Users=<comma separated list of user names>
- List the associations of the user(s).
- Partition=<comma separated list of partition names>
- List the associations of the partition(s).
NOTE: You can also use the general specifications list above in
the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES
section.
Other options unique for listing associations:
- OnlyDefaults
- Display only associations that are default associations
- Tree
- Display account names in a hierarchical fashion.
- WithDeleted
- Display information with previously deleted data.
- WithSubAccounts
- Display information with subaccounts. Only really valuable when used with
the account= option. This will display all the subaccount associations
along with the accounts listed in the option.
- WOLimits
- Display information without limit information. This is for a smaller
default format of "Cluster,Account,User,Partition".
- WOPInfo
- Display information without parent information (i.e. parent id, and parent
account name). This option also implicitly sets the WOPLimits option.
- WOPLimits
- Display information without hierarchical parent limits (i.e. will only
display limits where they are set instead of propagating them from the
parent).
- Account
- The name of a bank account in the association.
- Cluster
- The name of a cluster in the association.
- DefaultQOS
- The QOS the association will use by default if it as access to it in the
QOS list mentioned below.
- Fairshare
- Number used in conjunction with other accounts to determine job priority.
Can also be the string parent, when used on a user this means that
the parent association is used for fairshare. If Fairshare=parent is set
on an account, that account's children will be effectively reparented for
fairshare calculations to the first parent of their parent that is not
Fairshare=parent. Limits remain the same, only its fairshare value is
affected.
- GrpTRESMins
- The total number of TRES minutes that can possibly be used by past,
present and future jobs running from this association and its children.
- GrpTRESRunMins
- Used to limit the combined total number of TRES minutes used by all jobs
running with this association and its children. This takes into
consideration time limit of running jobs and consumes it, if the limit is
reached no new jobs are started until other jobs finish to allow time to
free up.
- GrpTRES
- Maximum number of TRES running jobs are able to be allocated in aggregate
for this association and all associations which are children of this
association.
- GrpJobs
- Maximum number of running jobs in aggregate for this association and all
associations which are children of this association.
- GrpJobsAccrue
- Maximum number of pending jobs in aggregate able to accrue age priority
for this association and all associations which are children of this
association.
- GrpSubmitJobs
- Maximum number of jobs which can be in a pending or running state at any
time in aggregate for this association and all associations which are
children of this association.
NOTE: This setting shows up in the sacctmgr output as
GrpSubmit.
- GrpWall
- Maximum wall clock time running jobs are able to be allocated in aggregate
for this association and all associations which are children of this
association.
- ID
- The id of the association.
- LFT
- Associations are kept in a hierarchy: this is the left most spot in the
hierarchy. When used with the RGT variable, all associations with a LFT
inside this LFT and before the RGT are children of this association.
- MaxTRESPerJob
- Maximum number of TRES each job is able to use.
NOTE: This setting shows up in the sacctmgr output as
MaxTRES.
- MaxTRESMinsPerJob
- Maximum number of TRES minutes each job is able to use.
NOTE: This setting shows up in the sacctmgr output as
MaxTRESMins.
- MaxTRESPerNode
- Maximum number of TRES each node in a job allocation can use.
- MaxJobs
- Maximum number of jobs each user is allowed to run at one time.
- MaxJobsAccrue
- Maximum number of pending jobs able to accrue age priority at any given
time.
- MaxSubmitJobs
- Maximum number of jobs pending or running state at any time.
NOTE: This setting shows up in the sacctmgr output as
MaxSubmit.
- MaxWallDurationPerJob
- Maximum wall clock time each job is able to use.
NOTE: This setting shows up in the sacctmgr output as
MaxWall.
- Qos
Valid QOS´ for this association.
- QosRaw
QOS´ ID.
- ParentID
- The association id of the parent of this association.
- ParentName
- The account name of the parent of this association.
- Partition
- The name of a partition in the association.
- Priority
- What priority will be added to a job´s priority when using this
association.
- WithRawQOSLevel
- Display QosLevel in an unevaluated raw format, consisting of a comma
separated list of QOS names prepended with '' (nothing), '+' or '-' for
the association. QOS names without +/- prepended were assigned (ie,
sacctmgr modify ... set QosLevel=qos_name) for the entity listed or on one
of its parents in the hierarchy. QOS names with +/- prepended indicate the
QOS was added/filtered (ie, sacctmgr modify ... set QosLevel=[+-]qos_name)
for the entity listed or on one of its parents in the hierarchy. Including
WOPLimits will show exactly where each QOS was assigned, added or filtered
in the hierarchy.
- RGT
- Associations are kept in a hierarchy: this is the right most spot in the
hierarchy. When used with the LFT variable, all associations with a LFT
inside this RGT and after the LFT are children of this association.
- User
- The name of a user in the association.
- Classification=<classification>
- Type of machine, current classifications are capability, capacity and
capapacity.
- Features=<comma separated list of feature names>
- Features that are specific to the cluster. Federated jobs can be directed
to clusters that contain the job requested features.
- Federation=<federation>
- The federation that this cluster should be a member of. A cluster can only
be a member of one federation at a time.
- FedState=<state>
- The state of the cluster in the federation.
Valid states are:
- ACTIVE
- Cluster will actively accept and schedule federated jobs.
- INACTIVE
- Cluster will not schedule or accept any jobs.
- DRAIN
- Cluster will not accept any new jobs and will let existing federated jobs
complete.
- DRAIN+REMOVE
- Cluster will not accept any new jobs and will remove itself from the
federation once all federated jobs have completed. When removed from the
federation, the cluster will accept jobs as a non-federated cluster.
- Name=<name>
- The name of a cluster. This should be equal to the ClusterName
parameter in the slurm.conf configuration file for some
Slurm-managed cluster.
- RPC=<rpc list>
- Comma separated list of numeric RPC values.
- WithFed
- Appends federation related columns to default format options (e.g.
Federation,ID,Features,FedState).
- WOLimits
- Display information without limit information. This is for a smaller
default format of Cluster,ControlHost,ControlPort,RPC
NOTE: You can also use the general specifications list above in
the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES
section.
- Classification
- Type of machine, i.e. capability, capacity or capapacity.
- Cluster
- The name of the cluster.
- ControlHost
- When a slurmctld registers with the database the ip address of the
controller is placed here.
- ControlPort
- When a slurmctld registers with the database the port the controller is
listening on is placed here.
- Features
- The list of features on the cluster (if any).
- Federation
- The name of the federation this cluster is a member of (if any).
- FedState
- The state of the cluster in the federation (if a member of one).
- FedStateRaw
- Numeric value of the name of the FedState.
- Flags
- Attributes possessed by the cluster. Current flags include Cray, External
and MultipleSlurmd.
External clusters are registration only clusters. A slurmctld
can designate an external slurmdbd with the
AccountingStorageExternalHost slurm.conf option. This allows a
slurmctld to register to an external slurmdbd so that clusters attached
to the external slurmdbd can communicate with the external cluster with
Slurm commands.
- ID
- The ID assigned to the cluster when a member of a federation. This ID
uniquely identifies the cluster and its jobs in the federation.
- NodeCount
- The current count of nodes associated with the cluster.
- NodeNames
- The current Nodes associated with the cluster.
- PluginIDSelect
- The numeric value of the select plugin the cluster is using.
- RPC
- When a slurmctld registers with the database the rpc version the
controller is running is placed here.
- TRES
- Trackable RESources (Billing, BB (Burst buffer), CPU, Energy, GRES,
License, Memory, and Node) this cluster is accounting for.
NOTE: You can also view the information about the root association
for the cluster. The Association format fields are described in the
LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
- Account=<comma separated list of account names>
- Account name to add this user as a coordinator to.
- Names=<comma separated list of user names>
- Names of coordinators.
NOTE: To list coordinators use the WithCoordinator options with
list account or list user.
- All_Clusters
- Get information on all cluster shortcut.
- All_Time
- Get time period for all time shortcut.
- Clusters=<comma separated list of cluster names>
- List the events of the cluster(s). Default is the cluster where the
command was run.
- End=<OPT>
- Period ending of events. Default is now.
Valid time formats are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
- Event=<OPT>
- Specific events to look for, valid options are Cluster or Node, default is
both.
- MaxTRES=<OPT>
- Max number of TRES affected by an event.
- MinTRES=<OPT>
- Min number of TRES affected by an event.
- Nodes=<comma separated list of node names>
- Node names affected by an event.
- Reason=<comma separated list of reasons>
- Reason an event happened.
- Start=<OPT>
- Period start of events. Default is 00:00:00 of previous day, unless states
are given with the States= spec events. If this is the case the default
behavior is to return events currently in the states specified.
Valid time formats are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
- States=<comma separated list of states>
- State of a node in a node event. If this is set, the event type is set
automatically to Node.
- User=<comma separated list of users>
- Query against users who set the event. If this is set, the event type is
set automatically to Node since only user slurm can perform a cluster
event.
- Cluster
- The name of the cluster event happened on.
- ClusterNodes
- The hostlist of nodes on a cluster in a cluster event.
- Duration
- Time period the event was around for.
- End
- Period when event ended.
- Event
- Name of the event.
- EventRaw
- Numeric value of the name of the event.
- NodeName
- The node affected by the event. In a cluster event, this is blank.
- Reason
- The reason an event happened.
- Start
- Period when event started.
- State
- On a node event this is the formatted state of the node during the event.
- StateRaw
- On a node event this is the numeric value of the state of the node during
the event.
- TRES
- Number of TRES involved with the event.
- User
- On a node event this is the user who caused the event to happen.
- Clusters[+|-]=<comma separated list of cluster names>
- List of clusters to add/remove to a federation. A blank value (e.g.
clusters=) will remove all federations for the federation. NOTE: a cluster
can only be a member of one federation.
- Name=<name>
- The name of the federation.
- Tree
- Display federations in a hierarchical fashion.
- Features
- The list of features on the cluster.
- Federation
- The name of the federation.
- Cluster
- Name of the cluster that is a member of the federation.
- FedState
- The state of the cluster in the federation.
- FedStateRaw
- Numeric value of the name of the FedState.
- Index
- The index of the cluster in the federation.
- Comment=<comment>
- The job's comment string when the AccountingStoreJobComment parameter in
the slurm.conf file is set (or defaults) to YES. The user can only modify
the comment string of their own job.
- Cluster=<cluster_list>
- List of clusters to alter jobs on, defaults to local cluster.
- DerivedExitCode=<derived_exit_code>
- The derived exit code can be modified after a job completes based on the
user's judgment of whether the job succeeded or failed. The user can only
modify the derived exit code of their own job.
- EndTime
- Jobs must end before this time to be modified. Format output is,
YYYY-MM-DDTHH:MM:SS, unless changed through the SLURM_TIME_FORMAT
environment variable.
- JobID=<jobid_list>
- The id of the job to change. Not needed if altering multiple jobs using
wckey specification.
- NewWCKey=<newwckey>
- Use to rename a wckey on job(s) in the accounting database
- StartTime
- Jobs must start at or after this time to be modified in the same format as
EndTime.
- User=<user_list>
- Used to specify the jobs of users jobs to alter.
- WCKey=<wckey_list>
- Used to specify the wckeys to alter.
- The DerivedExitCode, Comment and WCKey fields are the
only
- fields of a job record in the database that can be modified after job
completion.
The sacct command is the exclusive command to display job records from
the Slurm database.
NOTE: The group limits (GrpJobs, GrpNodes, etc.) are tested when a job is
being considered for being allocated resources. If starting a job would cause
any of its group limit to be exceeded, that job will not be considered for
scheduling even if that job might preempt other jobs which would release
sufficient group resources for the pending job to be initiated.
- Flags
- Used by the slurmctld to override or enforce certain characteristics.
Valid options are
- DenyOnLimit
- If set, jobs using this QOS will be rejected at submission time if they do
not conform to the QOS 'Max' limits. Group limits will also be treated
like 'Max' limits as well and will be denied if they go over. By default
jobs that go over these limits will pend until they conform. This
currently only applies to QOS and Association limits.
- EnforceUsageThreshold
- If set, and the QOS also has a UsageThreshold, any jobs submitted with
this QOS that fall below the UsageThreshold will be held until their
Fairshare Usage goes above the Threshold.
- NoDecay
- If set, this QOS will not have its GrpTRESMins, GrpWall and UsageRaw
decayed by the slurm.conf PriorityDecayHalfLife or
PriorityUsageResetPeriod settings. This allows a QOS to provide aggregate
limits that, once consumed, will not be replenished automatically. Such a
QOS will act as a time-limited quota of resources for an association that
has access to it. Account/user usage will still be decayed for
associations using the QOS. The QOS GrpTRESMins and GrpWall limits can be
increased or the QOS RawUsage value reset to 0 (zero) to again allow jobs
submitted with this QOS to be queued (if DenyOnLimit is set) or run
(pending with QOSGrp{TRES}MinutesLimit or QOSGrpWallLimit reasons, where
{TRES} is some type of trackable resource).
- NoReserve
- If this flag is set and backfill scheduling is used, jobs using this QOS
will not reserve resources in the backfill schedule's map of resources
allocated through time. This flag is intended for use with a QOS that may
be preempted by jobs associated with all other QOS (e.g use with a
"standby" QOS). If this flag is used with a QOS which can not be
preempted by all other QOS, it could result in starvation of larger
jobs.
- PartitionMaxNodes
- If set jobs using this QOS will be able to override the requested
partition's MaxNodes limit.
- PartitionMinNodes
- If set jobs using this QOS will be able to override the requested
partition's MinNodes limit.
- OverPartQOS
- If set jobs using this QOS will be able to override any limits used by the
requested partition's QOS limits.
- PartitionTimeLimit
- If set jobs using this QOS will be able to override the requested
partition's TimeLimit.
- RequiresReservation
- If set jobs using this QOS must designate a reservation when submitting a
job. This option can be useful in restricting usage of a QOS that may have
greater preemptive capability or additional resources to be allowed only
within a reservation.
- UsageFactorSafe
- If set, and AccountingStorageEnforce includes Safe, jobs
will only be able to run if the job can run to completion with the
UsageFactor applied.
- GraceTime
- Preemption grace time to be extended to a job which has been selected for
preemption.
- GrpTRESMins
- The total number of TRES minutes that can possibly be used by past,
present and future jobs running from this QOS.
- GrpTRESRunMins Used to limit the combined total number of TRES
- minutes used by all jobs running with this QOS. This takes into
consideration time limit of running jobs and consumes it, if the limit is
reached no new jobs are started until other jobs finish to allow time to
free up.
- GrpTRES
- Maximum number of TRES running jobs are able to be allocated in aggregate
for this QOS.
- GrpJobs
- Maximum number of running jobs in aggregate for this QOS.
- GrpJobsAccrue
- Maximum number of pending jobs in aggregate able to accrue age priority
for this QOS.
- GrpSubmitJobs
- Maximum number of jobs which can be in a pending or running state at any
time in aggregate for this QOS.
NOTE: This setting shows up in the sacctmgr output as
GrpSubmit.
- GrpWall
- Maximum wall clock time running jobs are able to be allocated in aggregate
for this QOS. If this limit is reached submission requests will be denied
and the running jobs will be killed.
- ID
- The id of the QOS.
- MaxJobsAccruePerAccount
- Maximum number of pending jobs an account (or subacct) can have accruing
age priority at any given time.
- MaxJobsAccruePerUser
- Maximum number of pending jobs a user can have accruing age priority at
any given time.
- MaxJobsPerAccount
- Maximum number of jobs each account is allowed to run at one time.
- MaxJobsPerUser
- Maximum number of jobs each user is allowed to run at one time.
- MaxSubmitJobsPerAccount
- Maximum number of jobs pending or running state at any time per account.
- MaxSubmitJobsPerUser
- Maximum number of jobs pending or running state at any time per user.
- MaxTRESMinsPerJob
- Maximum number of TRES minutes each job is able to use.
NOTE: This setting shows up in the sacctmgr output as
MaxTRESMins.
- MaxTRESPerAccount
- Maximum number of TRES each account is able to use.
- MaxTRESPerJob
- Maximum number of TRES each job is able to use.
NOTE: This setting shows up in the sacctmgr output as
MaxTRES.
- MaxTRESPerNode
- Maximum number of TRES each node in a job allocation can use.
- MaxTRESPerUser
- Maximum number of TRES each user is able to use.
- MaxWallDurationPerJob
- Maximum wall clock time each job is able to use.
NOTE: This setting shows up in the sacctmgr output as
MaxWall.
- MinPrioThreshold
- Minimum priority required to reserve resources when scheduling.
- MinTRESPerJob
- Minimum number of TRES each job running under this QOS must request.
Otherwise the job will pend until modified.
NOTE: This setting shows up in the sacctmgr output as
MinTRES.
- Name
- Name of the QOS.
- Preempt
- Other QOS´ this QOS can preempt.
NOTE: The Priority of a QOS is NOT related to
QOS preemption, only Preempt is used to define which QOS can
preempt others.
- PreemptExemptTime
- Specifies a minimum run time for jobs of this QOS before they are
considered for preemption. This QOS option takes precedence over the
global PreemptExemptTime. Setting to -1 disables the option,
allowing another QOS or the global option to take effect. Setting to 0
indicates no minimum run time and supersedes the lower priority QOS (see
OverPartQOS) and/or the global option in slurm.conf.
- PreemptMode
- Mechanism used to preempt jobs or enable gang scheduling for this QOS when
the cluster PreemptType is set to preempt/qos. This
QOS-specific PreemptMode will override the cluster-wide
PreemptMode for this QOS. Unsetting the QOS specific
PreemptMode, by specifying "OFF", "" or
"Cluster", makes it use the default cluster-wide
PreemptMode.
See the description of the cluster-wide PreemptMode parameter for
further details of the available modes.
- Priority
- What priority will be added to a job´s priority when using this
QOS.
NOTE: The Priority of a QOS is NOT related to
QOS preemption, see Preempt instead.
- RawUsage=<value>
- This allows an administrator to reset the raw usage accrued to a QOS. The
only value currently supported is 0 (zero). This is a settable
specification only - it cannot be used as a filter to list accounts.
- UsageFactor
- Usage factor when running with this QOS. See below for more details.
- UsageThreshold
- A float representing the lowest fairshare of an association allowable to
run a job. If an association falls below this threshold and has pending
jobs or submits new jobs those jobs will be held until the usage goes back
above the threshold. Use sshare to see current shares on the
system.
- WithDeleted
- Display information with previously deleted data.
- Description
- An arbitrary string describing a QOS.
- GraceTime
- Preemption grace time to be extended to a job which has been selected for
preemption in the format of hh:mm:ss. The default value is zero, no
preemption grace time is allowed on this partition. NOTE: This value is
only meaningful for QOS PreemptMode=CANCEL.
- GrpTRESMins
- The total number of TRES minutes that can possibly be used by past,
present and future jobs running from this QOS. To clear a previously set
value use the modify command with a new value of -1 for each TRES id.
NOTE: This limit only applies when using the Priority Multifactor plugin.
The time is decayed using the value of PriorityDecayHalfLife or
PriorityUsageResetPeriod as set in the slurm.conf. When this limit is
reached all associated jobs running will be killed and all future jobs
submitted with this QOS will be delayed until they are able to run inside
the limit.
- GrpTRES
- Maximum number of TRES running jobs are able to be allocated in aggregate
for this QOS. To clear a previously set value use the modify command with
a new value of -1 for each TRES id.
- GrpJobs
- Maximum number of running jobs in aggregate for this QOS. To clear a
previously set value use the modify command with a new value of -1.
- GrpJobsAccrue
- Maximum number of pending jobs in aggregate able to accrue age priority
for this QOS. To clear a previously set value use the modify command with
a new value of -1.
- GrpSubmitJobs
- Maximum number of jobs which can be in a pending or running state at any
time in aggregate for this QOS. To clear a previously set value use the
modify command with a new value of -1.
NOTE: This setting shows up in the sacctmgr output as
GrpSubmit.
- GrpWall
- Maximum wall clock time running jobs are able to be allocated in aggregate
for this QOS. To clear a previously set value use the modify command with
a new value of -1. NOTE: This limit only applies when using the Priority
Multifactor plugin. The time is decayed using the value of
PriorityDecayHalfLife or PriorityUsageResetPeriod as set in the
slurm.conf. When this limit is reached all associated jobs running will be
killed and all future jobs submitted with this QOS will be delayed until
they are able to run inside the limit.
- MaxTRESMinsPerJob
- Maximum number of TRES minutes each job is able to use. To clear a
previously set value use the modify command with a new value of -1 for
each TRES id.
NOTE: This setting shows up in the sacctmgr output as
MaxTRESMins.
- MaxTRESPerAccount
- Maximum number of TRES each account is able to use. To clear a previously
set value use the modify command with a new value of -1 for each TRES id.
- MaxTRESPerJob
- Maximum number of TRES each job is able to use. To clear a previously set
value use the modify command with a new value of -1 for each TRES id.
NOTE: This setting shows up in the sacctmgr output as
MaxTRES.
- MaxTRESPerNode
- Maximum number of TRES each node in a job allocation can use. To clear a
previously set value use the modify command with a new value of -1 for
each TRES id.
- MaxTRESPerUser
- Maximum number of TRES each user is able to use. To clear a previously set
value use the modify command with a new value of -1 for each TRES id.
- MaxJobsPerAccount
- Maximum number of jobs each account is allowed to run at one time. To
clear a previously set value use the modify command with a new value of
-1.
- MaxJobsPerUser
- Maximum number of jobs each user is allowed to run at one time. To clear a
previously set value use the modify command with a new value of -1.
- MaxSubmitJobsPerAccount
- Maximum number of jobs pending or running state at any time per account.
To clear a previously set value use the modify command with a new value of
-1.
- MaxSubmitJobsPerUser
- Maximum number of jobs pending or running state at any time per user. To
clear a previously set value use the modify command with a new value of
-1.
- MaxWallDurationPerJob
- Maximum wall clock time each job is able to use. <max wall> format
is <min> or <min>:<sec> or
<hr>:<min>:<sec> or
<days>-<hr>:<min>:<sec> or
<days>-<hr>. The value is recorded in minutes with rounding as
needed. To clear a previously set value use the modify command with a new
value of -1.
NOTE: This setting shows up in the sacctmgr output as
MaxWall.
- MinPrioThreshold
- Minimum priority required to reserve resources when scheduling. To clear a
previously set value use the modify command with a new value of -1.
- MinTRES
- Minimum number of TRES each job running under this QOS must request.
Otherwise the job will pend until modified. To clear a previously set
value use the modify command with a new value of -1 for each TRES id.
- Name
- Name of the QOS. Needed for creation.
- Preempt
- Other QOS´ this QOS can preempt. Setting a Preempt to '' (two
single quotes with nothing between them) restores its default setting. You
can also use the operator += and -= to add or remove certain QOS's from a
QOS list.
- PreemptMode
- Mechanism used to preempt jobs of this QOS if the clusters
PreemptType is configured to preempt/qos. The default
preemption mechanism is specified by the cluster-wide PreemptMode
configuration parameter. Possible values are "Cluster" (meaning
use cluster default), "Cancel", and "Requeue". This
option is not compatible with PreemptMode=OFF or PreemptMode=SUSPEND (i.e.
preempted jobs must be removed from the resources).
- Priority
- What priority will be added to a job´s priority when using this
QOS. To clear a previously set value use the modify command with a new
value of -1.
- UsageFactor
- A float that is factored into a job’s TRES usage (e.g. RawUsage,
TRESMins, TRESRunMins). For example, if the usagefactor was 2, for every
TRESBillingUnit second a job ran it would count for 2. If the usagefactor
was .5, every second would only count for half of the time. A setting of 0
would add no timed usage from the job.
The usage factor only applies to the job's QOS and not the
partition QOS.
If the UsageFactorSafe flag is set and
AccountingStorageEnforce includes Safe, jobs will only be
able to run if the job can run to completion with the UsageFactor
applied.
If the UsageFactorSafe flag is not set and
AccountingStorageEnforce includes Safe, a job will be able
to be scheduled without the UsageFactor applied and will be able
to run without being killed due to limits.
If the UsageFactorSafe flag is not set and
AccountingStorageEnforce does not include Safe, a job will
be able to be scheduled without the UsageFactor applied and could
be killed due to limits.
See AccountingStorageEnforce in slurm.conf man
page.
Default is 1. To clear a previously set value use the modify
command with a new value of -1.
- Clusters=<comma separated list of cluster names>
- List the reservations of the cluster(s). Default is the cluster where the
command was run.
- End=<OPT>
- Period ending of reservations. Default is now.
Valid time formats are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
- ID=<OPT>
- Comma separated list of reservation ids.
- Names=<OPT>
- Comma separated list of reservation names.
- Nodes=<comma separated list of node names>
- Node names where reservation ran.
- Start=<OPT>
- Period start of reservations. Default is 00:00:00 of current day.
Valid time formats are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
- Associations
- The id's of the associations able to run in the reservation.
- Cluster
- Name of cluster reservation was on.
- End
- End time of reservation.
- Flags
- Flags on the reservation.
- ID
- Reservation ID.
- Name
- Name of this reservation.
- NodeNames
- List of nodes in the reservation.
- Start
- Start time of reservation.
- TRES
- List of TRES in the reservation.
- UnusedWall
- Wall clock time in seconds unused by any job. A job's allocated usage is
its run time multiplied by the ratio of its CPUs to the total number of
CPUs in the reservation. For example, a job using all the CPUs in the
reservation running for 1 minute would reduce unused_wall by 1 minute.
Clusters=<name list> Comma separated list of cluster names on which
specified resources are to be available. If no names are designated then the
clusters already allowed to use this resource will be altered.
- Count=<OPT>
- Number of software resources of a specific name configured on the system
being controlled by a resource manager.
- Descriptions=
- A brief description of the resource.
- Flags=<OPT>
- Flags that identify specific attributes of the system resource. At this
time no flags have been defined.
- ServerType=<OPT>
- The type of a software resource manager providing the licenses. For
example FlexNext Publisher Flexlm license server or Reprise License
Manager RLM.
- Names=<OPT>
- Comma separated list of the name of a resource configured on the system
being controlled by a resource manager. If this resource is seen on the
slurmctld its name will be name@server to distinguish it from local
resources defined in a slurm.conf.
- PercentAllowed=<percent allowed>
- Percentage of a specific resource that can be used on specified cluster.
- Server=<OPT>
- The name of the server serving up the resource. Default is 'slurmdb'
indicating the licenses are being served by the database.
- Type=<OPT>
- The type of the resource represented by this record. Currently the only
valid type is License.
- WithClusters
- Display the clusters percentage of resources. If a resource hasn't been
given to a cluster the resource will not be displayed with this flag.
NOTE: Resource is used to define each resource configured on a
system available for usage by Slurm clusters.
- Cluster
- Name of cluster resource is given to.
- Count
- The count of a specific resource configured on the system globally.
- Allocated
- The percent of licenses allocated to a cluster.
- Description
- Description of the resource.
- ServerType
- The type of the server controlling the licenses.
- Name
- Name of this resource.
- Server
- Server serving up the resource.
- Type
- Type of resource this record represents.
- Cluster
- Name of cluster job ran on.
- ID
- Id of the job.
- Name
- Name of the job.
- Partition
- Partition job ran on.
- State
- Current State of the job in the database.
- TimeStart
- Time job started running.
- TimeEnd
- Current recorded time of the end of the job.
- Accounts=<comma separated list of account names>
- Only print out the transactions affecting specified accounts.
- Action=<Specific action the list will display>
-
- Actor=<Specific name the list will display>
- Only display transactions done by a certain person.
- Clusters=<comma separated list of cluster names>
- Only print out the transactions affecting specified clusters.
- End=<Date and time of last transaction to return>
- Return all transactions before this Date and time. Default is now.
- Start=<Date and time of first transaction to return>
- Return all transactions after this Date and time. Default is epoch.
Valid time formats for End and Start are...
HH:MM[:SS] [AM|PM]
MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
MM/DD[/YY]-HH:MM[:SS]
YYYY-MM-DD[THH:MM[:SS]]
- Users=<comma separated list of user names>
- Only print out the transactions affecting specified users.
- WithAssoc
- Get information about which associations were affected by the
transactions.
- Action
- Displays the type of Action that took place.
- Actor
- Displays the Actor to generate a transaction.
- Info
- Displays details of the transaction.
- TimeStamp
- Displays when the transaction occurred.
- Where
- Displays details of the constraints for the transaction.
NOTE: If using the WithAssoc option you can also view the
information about the various associations the transaction affected. The
Association format fields are described in the LIST/SHOW ASSOCIATION
FORMAT OPTIONS section.
- Account=<account>
- Account name to add this user to.
- AdminLevel=<level>
- Admin level of user. Valid levels are None, Operator, and Admin.
- Cluster=<cluster>
- Specific cluster to add user to the account on. Default is all in system.
- DefaultAccount=<account>
- Identify the default bank account name to be used for a job if none is
specified at submission time.
- DefaultWCKey=<defaultwckey>
- Identify the default Workload Characterization Key.
- Name=<name>
- Name of user.
- NewName=<newname>
- Use to rename a user in the accounting database
- Partition=<name>
- Partition name.
- RawUsage=<value>
- This allows an administrator to reset the raw usage accrued to a user. The
only value currently supported is 0 (zero). This is a settable
specification only - it cannot be used as a filter to list users.
- WCKeys=<wckeys>
- Workload Characterization Key values.
- WithAssoc
- Display all associations for this user.
- WithCoord
- Display all accounts a user is coordinator for.
- WithDeleted
- Display information with previously deleted data.
NOTE: If using the WithAssoc option you can also query against
association specific information to view only certain associations this user
may have. These extra options can be found in the SPECIFICATIONS FOR
ASSOCIATIONS section. You can also use the general specifications list
above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED
ENTITIES section.
- AdminLevel
- Admin level of user.
- DefaultAccount
- The user's default account.
- Coordinators
- List of users that are a coordinator of the account. (Only filled in when
using the WithCoordinator option.)
- User
- The name of a user.
NOTE: If using the WithAssoc option you can also view the
information about the various associations the user may have on all the
clusters in the system. The association information can be filtered. Note
that all the users in the database will always be shown as filter only takes
effect over the association data. The Association format fields are
described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
- WCKey
- Workload Characterization Key.
- Cluster
- Specific cluster for the WCKey.
- User
- The name of a user for the WCKey.
NOTE: If using the WithAssoc option you can also view the
information about the various associations the user may have on all the
clusters in the system. The Association format fields are described in the
LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
- Name
- The name of the trackable resource. This option is required for TRES types
BB (Burst buffer), GRES, and License. Types CPU, Energy, Memory, and Node
do not have Names. For example if GRES is the type then name is the
denomination of the GRES itself e.g. GPU.
- ID
- The identification number of the trackable resource as it appears in the
database.
- Type
- The type of the trackable resource. Current types are BB (Burst buffer),
CPU, Energy, GRES, License, Memory, and Node.
Trackable RESources (TRES) are used in many QOS or Association limits. When
setting the limits they are comma separated list. Each TRES has a different
limit, i.e. GrpTRESMins=cpu=10,mem=20 would make 2 different limits 1 for 10
cpu minutes and 1 for 20 MB memory minutes. This is the case for each limit
that deals with TRES. To remove the limit -1 is used i.e. GrpTRESMins=cpu=-1
would remove only the cpu TRES limit.
NOTE: When dealing with Memory as a TRES all limits are in MB.
NOTE: The Billing TRES is calculated from a partition's
TRESBillingWeights. It is temporarily calculated during scheduling for each
partition to enforce billing TRES limits. The final Billing TRES is
calculated after the job has been allocated resources. The final number can
be seen in scontrol show jobs and sacct output.
When using the format option for listing various fields you can put a %NUMBER
afterwards to specify how many characters should be printed.
e.g. format=name%30 will print 30 characters of field name right
justified. A -30 will print 30 characters left justified.
sacctmgr has the capability to load and dump Slurm association data to and from
a file. This method can easily add a new cluster or copy an existing cluster's
associations into a new cluster with similar accounts. Each file contains
Slurm association data for a single cluster. Comments can be put into the file
with the # character. Each line of information must begin with one of the four
titles; Cluster, Parent, Account or User.
Following the title is a space, dash, space, entity value, then
specifications. Specifications are colon separated. If any variable, such as
an Organization name, has a space in it, surround the name with single or
double quotes.
To create a file of associations you can run
sacctmgr dump tux file=tux.cfg
To load a previously created file you can run
sacctmgr load file=tux.cfg
sacctmgr dump/load must be run as a Slurm administrator or root.
If using sacctmgr load on a database without any associations, it must be
run as root (because there aren't any users in the database yet).
Other options for load are:
clean - delete what was already there and start
from scratch with this information.
Cluster= - specify a different name for the cluster than that which is in
the file.
Since the associations in the system follow a hierarchy, so does
the file. Anything that is a parent needs to be defined before any children.
The only exception is the understood 'root' account. This is always a
default for any cluster and does not need to be defined.
To edit/create a file start with a cluster line for the new
cluster:
Cluster - cluster_name:MaxTRESPerJob=node=15
Anything included on this line will be the default for all
associations on this cluster. The options for the cluster are:
- GrpTRESMins=
-
The total number of TRES minutes that can possibly be used by past,
present and future jobs running from this association and its children.
- GrpTRESRunMins=
-
Used to limit the combined total number of TRES minutes used by all
jobs running with this association and its children. This takes into
consideration time limit of running jobs and consumes it, if the limit
is reached no new jobs are started until other jobs finish to allow
time to free up.
- GrpTRES=
-
Maximum number of TRES running jobs are able to be
allocated in aggregate for this association and all associations which
are children of this association.
- GrpJobs=
-
Maximum number of running jobs in aggregate for this
association and all associations which are children of this association.
- GrpJobsAccrue=
-
Maximum number of pending jobs in aggregate able to accrue age priority for this
association and all associations which are children of this association.
- GrpNodes=
-
Maximum number of nodes running jobs are able to be
allocated in aggregate for this association and all associations which
are children of this association.
- GrpSubmitJobs=
-
Maximum number of jobs which can be in a pending or
running state at any time in aggregate for this association and all
associations which are children of this association.
- GrpWall=
-
Maximum wall clock time running jobs are able to be
allocated in aggregate for this association and all associations which
are children of this association.
- FairShare=
-
Number used in conjunction with other associations to determine job priority.
- MaxJobs=
-
Maximum number of jobs the children of this association can run.
- MaxTRESPerJob=
-
Maximum number of trackable resources per job the children of this association
can run.
- MaxWallDurationPerJob=
-
Maximum time (not related to job size) children of this accounts jobs can run.
- QOS=
Comma separated list of Quality of Service names (Defined in sacctmgr).
After the entry for the root account you will have entries for the other
accounts on the system. The entries will look similar to this example:
Parent - root
Account - cs:MaxTRESPerJob=node=5:MaxJobs=4:FairShare=399:MaxWallDurationPerJob=40:Description='Computer Science':Organization='LC'
Parent - cs
Account - test:MaxTRESPerJob=node=1:MaxJobs=1:FairShare=1:MaxWallDurationPerJob=1:Description='Test Account':Organization='Test'
Any of the options after a ':' can be left out and they can be in any order.
If you want to add any sub accounts just list the Parent THAT HAS ALREADY
BEEN CREATED before the account you are adding.
Account options are:
- Description=
-
A brief description of the account.
- GrpTRESMins=
-
Maximum number of TRES hours running jobs are able to
be allocated in aggregate for this association and all associations
which are children of this association.
GrpTRESRunMins=
Used to limit the combined total number of TRES minutes used by all
jobs running with this association and its children. This takes into
consideration time limit of running jobs and consumes it, if the limit
is reached no new jobs are started until other jobs finish to allow
time to free up.
- GrpTRES=
-
Maximum number of TRES running jobs are able to be
allocated in aggregate for this association and all associations which
are children of this association.
- GrpJobs=
-
Maximum number of running jobs in aggregate for this
association and all associations which are children of this association.
- GrpJobsAccrue
-
Maximum number of pending jobs in aggregate able to accrue age priority for this
association and all associations which are children of this association.
- GrpNodes=
-
Maximum number of nodes running jobs are able to be
allocated in aggregate for this association and all associations which
are children of this association.
- GrpSubmitJobs=
-
Maximum number of jobs which can be in a pending or
running state at any time in aggregate for this association and all
associations which are children of this association.
- GrpWall=
-
Maximum wall clock time running jobs are able to be
allocated in aggregate for this association and all associations which
are children of this association.
- FairShare=
-
Number used in conjunction with other associations to determine job priority.
- MaxJobs=
-
Maximum number of jobs the children of this association can run.
- MaxNodesPerJob=
-
Maximum number of nodes per job the children of this association can run.
- MaxWallDurationPerJob=
-
Maximum time (not related to job size) children of this accounts jobs can run.
- Organization=
-
Name of organization that owns this account.
- QOS(=,+=,-=)
-
Comma separated list of Quality of Service names (Defined in sacctmgr).
To add users to an account add a line after the Parent line, similar to this:
Parent - test
User - adam:MaxTRESPerJob=node:2:MaxJobs=3:FairShare=1:MaxWallDurationPerJob=1:AdminLevel=Operator:Coordinator='test'
User options are:
- AdminLevel=
-
Type of admin this user is (Administrator, Operator)
Must be defined on the first occurrence of the user.
- Coordinator=
-
Comma separated list of accounts this user is coordinator over
Must be defined on the first occurrence of the user.
- DefaultAccount=
-
System wide default account name
Must be defined on the first occurrence of the user.
- FairShare=
-
Number used in conjunction with other associations to determine job priority.
- MaxJobs=
-
Maximum number of jobs this user can run.
- MaxTRESPerJob=
-
Maximum number of trackable resources per job this user can run.
- MaxWallDurationPerJob=
-
Maximum time (not related to job size) this user can run.
- QOS(=,+=,-=)
-
Comma separated list of Quality of Service names (Defined in sacctmgr).
Sacctmgr has the capability to archive to a flatfile and or load that data if
needed later. The archiving is usually done by the slurmdbd and it is highly
recommended you only do it through sacctmgr if you completely understand what
you are doing. For slurmdbd options see "man slurmdbd" for more
information. Loading data into the database can be done from these files to
either view old data or regenerate rolled up data.
Dump accounting data to file. Data will not be archived unless the corresponding
purge option is included in this command or in slurmdbd.conf. This operation
cannot be rolled back once executed. If one of the following options is not
specified when sacctmgr is called, the value configured in slurmdbd.conf is
used.
- Directory=
- Directory to store the archive data.
- Events
- Archive Events. If not specified and PurgeEventAfter is set all event data
removed will be lost permanently.
- Jobs
- Archive Jobs. If not specified and PurgeJobAfter is set all job data
removed will be lost permanently.
- PurgeEventAfter=
- Purge cluster event records older than time stated in months. If you want
to purge on a shorter time period you can include hours, or days behind
the numeric value to get those more frequent purges. (e.g. a value of
'12hours' would purge everything older than 12 hours.)
- PurgeJobAfter=
- Purge job records older than time stated in months. If you want to purge
on a shorter time period you can include hours, or days behind the numeric
value to get those more frequent purges. (e.g. a value of '12hours' would
purge everything older than 12 hours.)
- PurgeStepAfter=
- Purge step records older than time stated in months. If you want to purge
on a shorter time period you can include hours, or days behind the numeric
value to get those more frequent purges. (e.g. a value of '12hours' would
purge everything older than 12 hours.)
- PurgeSuspendAfter=
- Purge job suspend records older than time stated in months. If you want to
purge on a shorter time period you can include hours, or days behind the
numeric value to get those more frequent purges. (e.g. a value of
'12hours' would purge everything older than 12 hours.)
- Script=
- Run this script instead of the generic form of archive to flat files.
- Steps
- Archive Steps. If not specified and PurgeStepAfter is set all step data
removed will be lost permanently.
- Suspend
- Archive Suspend Data. If not specified and PurgeSuspendAfter is set all
suspend data removed will be lost permanently.
Load in to the database previously archived data. The archive file will not be
loaded if the records already exist in the database - therefore, trying to
load an archive file more than once will result in an error. When this data is
again archived and purged from the database, if the old archive file is still
in the directory ArchiveDir, a new archive file will be created (see
ArchiveDir in the slurmdbd.conf man page), so the old file will not be
overwritten and these files will have duplicate records.
- File=
- File to load into database. The specified file must exist on the slurmdbd
host, which is not necessarily the machine running the command.
- Insert=
- SQL to insert directly into the database. This should be used very
cautiously since this is writing your sql into the database.
Executing sacctmgr sends a remote procedure call to slurmdbd. If
enough calls from sacctmgr or other Slurm client commands that send
remote procedure calls to the slurmdbd daemon come in at once, it can
result in a degradation of performance of the slurmdbd daemon, possibly
resulting in a denial of service.
Do not run sacctmgr or other Slurm client commands that
send remote procedure calls to slurmdbd from loops in shell scripts
or other programs. Ensure that programs limit calls to sacctmgr to
the minimum necessary for the information you are trying to gather.
Some sacctmgr options may be set via environment variables. These
environment variables, along with their corresponding options, are listed
below. (Note: commandline options will always override these settings)
- SLURM_CONF
- The location of the Slurm configuration file.
NOTE: There is an order to set up accounting associations. You must
define clusters before you add accounts and you must add accounts before you
can add users.
-> sacctmgr create cluster tux
-> sacctmgr create account name=science fairshare=50
-> sacctmgr create account name=chemistry parent=science fairshare=30
-> sacctmgr create account name=physics parent=science fairshare=20
-> sacctmgr create user name=adam cluster=tux account=physics fairshare=10
-> sacctmgr delete user name=adam cluster=tux account=physics
-> sacctmgr delete account name=physics cluster=tux
-> sacctmgr modify user where name=adam cluster=tux account=physics set
maxjobs=2 maxwall=30:00
-> sacctmgr add user brian account=chemistry
-> sacctmgr list associations cluster=tux
format=Account,Cluster,User,Fairshare tree withd
-> sacctmgr list transactions Action="Add Users"
Start=11/03-10:30:00 format=Where,Time
-> sacctmgr dump cluster=tux file=tux_data_file
-> sacctmgr load tux_data_file
A user's account can not be changed directly. A new association
needs to be created for the user with the new account. Then the association
with the old account can be deleted.
When modifying an object placing the key words 'set' and the
optional 'where' is critical to perform correctly below are examples to
produce correct results. As a rule of thumb anything you put in front of the
set will be used as a quantifier. If you want to put a quantifier after the
key word 'set' you should use the key word 'where'.
wrong-> sacctmgr modify user name=adam set fairshare=10
cluster=tux
This will produce an error as the above line reads modify user
adam set fairshare=10 and cluster=tux.
right-> sacctmgr modify user name=adam cluster=tux set
fairshare=10
right-> sacctmgr modify user name=adam set fairshare=10 where
cluster=tux
When changing qos for something only use the '=' operator when
wanting to explicitly set the qos to something. In most cases you will want
to use the '+=' or '\-=' operator to either add to or remove from the
existing qos already in place.
If a user already has qos of normal,standby for a parent or it was
explicitly set you should use qos+=expedite to add this to the list in this
fashion.
If you are looking to only add the qos expedite to only a certain
account and or cluster you can do that by specifying them in the sacctmgr
line.
-> sacctmgr modify user name=adam set qos+=expedite
> sacctmgr modify user name=adam acct=this cluster=tux set
qos+=expedite
Let's give an example how to add QOS to user accounts. List all
available QOSs in the cluster.
->sacctmgr show qos format=name
Name
---------
normal
expedite
List all the associations in the cluster.
->sacctmgr show assoc format=cluster,account,qos
Cluster Account QOS
-------- ---------- -----
zebra root normal
zebra root normal
zebra g normal
zebra g1 normal
Add the QOS expedite to account G1 and display the result. Using
the operator += the QOS will be added together with the existing QOS to this
account.
->sacctmgr modify account name=g1 set qos+=expedite
->sacctmgr show assoc format=cluster,account,qos
Cluster Account QOS
-------- -------- -------
zebra root normal
zebra root normal
zebra g normal
zebra g1 expedite,normal
Now set the QOS expedite as the only QOS for the account G and
display the result. Using the operator = that expedite is the only usable
QOS by account G
->sacctmgr modify account name=G set qos=expedite
>sacctmgr show assoc format=cluster,account,user,qos
Cluster Account QOS
--------- -------- -----
zebra root normal
zebra root normal
zebra g expedite
zebra g1 expedite,normal
If a new account is added under the account G it will inherit the
QOS expedite and it will not have access to QOS normal.
->sacctmgr add account banana parent=G
->sacctmgr show assoc format=cluster,account,qos
Cluster Account QOS
--------- -------- -----
zebra root normal
zebra root normal
zebra g expedite
zebra banana expedite
zebra g1 expedite,normal
An example of listing trackable resources
->sacctmgr show tres
Type Name ID
---------- ----------------- --------
cpu 1
mem 2
energy 3
node 4
billing 5
gres gpu:tesla 1001
license vcs 1002
bb cray 1003
Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced at
Lawrence Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2010-2016 SchedMD LLC.
This file is part of Slurm, a resource management program. For
details, see <https://slurm.schedmd.com/>.
Slurm is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option)
any later version.
Slurm is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
slurm.conf(5), slurmdbd(8)
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |