 |
|
| |
pbs_mom(8B) |
PBS |
pbs_mom(8B) |
pbs_mom - start a pbs batch execution mini-server
pbs_mom [-a alarm] [-C chkdirectory] [-c config]
[-d directory] [-H hostname] [-L logfile]
[-M MOMport] [-R RPPport] [-p|-q|-r] [-x]
The pbs_mom command starts the operation of a batch Machine
Oriented Mini-server, MOM, on the local host. Typically, this
command will be in a local boot file such as /etc/rc.local . To insure
that the pbs_mom command is not runnable by the general user community, the
server will only execute if its real and effective uid is zero.
One function of pbs_mom is to place jobs into execution as
directed by the server, establish resource usage limits, monitor the job's
usage, and notify the server when the job completes. If they exist, pbs_mom
will execute a prologue script before executing a job and an epilogue script
after executing the job. The next function of pbs_mom is to respond to
resource monitor requests. This was done by a separate process in previous
versions of PBS but has now been combined into one process. The resource
monitor function is provided mainly for the PBS scheduler. It provides
information about the status of running jobs, memory available etc. The next
function of pbs_mom is to respond to task manager requests. This involves
communicating with running tasks over a tcp socket as well as communicating
with other MOMs within a job (aka a "sisterhood").
Pbs_mom will record a diagnostic message in a log file for any
error occurrence. The log files are maintained in the mom_logs
directory below the home directory of the server. If the log file cannot be
opened, the diagnostic message is written to the system console.
- -a alarm
- Used to specify the alarm timeout in seconds for computing a resource.
Every time a resource request is processed, an alarm is set for the given
amount of time. If the request has not completed before the given time, an
alarm signal is generated. The default is 5 seconds.
- -C chkdirectory
- Specifieds the path of the directory used to hold checkpoint files.
[Currently this is only valid on Cray systems.] The default directory is
PBS_HOME/spool/checkpoint, see the -d option. The directory specified with
the -C option must be owned by root and accessible (rwx) only by root to
protect the security of the checkpoint files.
- -c config
- Specify a alternative configuration file, see description below. If this
is a relative file name it will be relative to PBS_HOME/mom_priv, see the
-d option. If the specified file cannot be opened, pbs_mom will abort. If
the -c option is not supplied, pbs_mom will attempt to open the default
configuration file "config" in PBS_HOME/mom_priv. If this file is
not present, pbs_mom will log the fact and continue.
- -H hostname
- Set MOM's hostname. This can be useful on multi-homed networks.
- -d directory
- Specifies the path of the directory which is the home of the servers
working files, PBS_HOME. This option is typically used along with -M when
debugging MOM. The default directory is given by $PBS_SERVER_HOME which is
typically /usr/spool/PBS.
- -L logfile
- Specify an absolute path name for use as the log file. If not specified,
MOM will open a file named for the current date in the PBS_HOME/mom_logs
directory, see the -d option.
- -M port
- Specifies the port number on which the mini-server (MOM) will listen for
batch requests.
- -R port
- Specifies the port number on which the mini-server (MOM) will listen for
resource monitor requests, task manager requests and inter-MOM messages.
Both a UDP and a TCP port of this number will be used.
- -p
- (Default after version 2.4.0) (Preserve running jobs) -- Specifies the
impact on jobs which were in execution when the mini-server shut-down. The
-p option tries to preserve any running jobs when the MOM restarts. The
new mini-server will not be the parent of any running jobs, MOM has lost
control of her offspring (not a new situation for a mother). The MOM will
allow the jobs to continue to run and monitor them indirectly via polling.
All recovered jobs will report an exit code of 0 when they are complete.
The -p option is mutually exclusive with the -r, -P and -q options.
- -P
- (Terminate all jobs and remove them from the queue) -- Specifies the
impact on jobs which were in execution when the mini-server shut-down.
With the -P option, it is assumed that either the entire system has been
restarted or the MOM has been down so long that it can no longer guarantee
that the pid of any running process is the same as the recorded job
process pid of a recovering job. Unlike the -p option no attempt is made
to try and preserve or recover running jobs. All jobs are terminated and
removed from the queue. The -q option is mutually exclusive with the -p,
-q and -r options.
- -q
- (Requeue all jobs - This is the default behavior in versions prior to
2.4.0) -- Specifies the impact on jobs which were in execution when the
mini-servershut-down. Do not terminate running processes. With the -q
option, it is assumed that either the entire system has been restarted or
the MOM has been down so long that it can no longer guarantee that the pid
of any running process is the same as the recorded job process pid of a
recovering job. No attempt is made to kill job processes. The MOM will
mark the jobs as terminated and notify the batch server which owns the
job. Re-runnable jobs will be requeued. The -q option is mutually
exclusive with the -p, -P and -r options.
- -r
- (Terminate running processes and requeue all jobs) -- Specifies the impact
on jobs which were in execution when the mini-server shut-down. With the
-r option, MOM will kill any processes belonging to running jobs, mark the
jobs as terminated and notify the batch server that owns the job.
Re-runnable jobs are reset to a queued state so they can be run again. The
-r option is mutually exclusive with the -p, -P and -q options.
- If the -r option is used following a reboot, process IDs (pids) may be
reused and MOM may kill a process that is not a batch session.
- -S port
- Specifies the port number on which the pbs_server is listening for
requests. If pbs_server is started with a -p option, pbs_mom will need to
use the -S option and match the port value which was used to start
pbs_server.
- -x
- Disables the check for privileged port resource monitor connections. This
is used mainly for testing since the privileged port is the only mechanism
used to prevent any ordinary user from connecting.
The configuration file may be specified on the command line at program start
with the -c flag. The use of this file is to provide several types of run time
information to pbs_mom: static resource names and values, external resources
provided by a program to be run on request via a shell escape, and values to
pass to internal set up functions at initialization (and re-initialization).
Each item type is on a single line with the component parts
separated by white space. If the line starts with a hash mark (pound sign,
#), the line is considered to be a comment and is skipped.
- Static Resources
- For static resource names and values, the configuration file contains a
list of resource names/values pairs, one pair per line and separated by
white space. An Example of static resource names and values could be the
number of tape drives of different types and could be specified by
- tape3480 4
tape3420 2
tapedat 1
tape8mm 1
- Shell Commands
- If the first character of the value is an exclamation mark (!), the entire
rest of the line is saved to be executed through the services of the
system(3) standard library routine.
- The shell escape provides a means for the resource monitor to yield
arbitrary information to the scheduler. Parameter substitution is done
such that the value of any qualifier sent with the query, as explained
below, replaces a token with a percent sign (%) followed by the name of
the qualifier. For example, here is a configuration file line which gives
a resource name of "escape":
- escape !echo %xxx %yyy
- If a query for "escape" is sent with no qualifiers, the command
executed would be "echo %xxx %yyy". If one qualifier is sent,
"escape[xxx=hi there]", the command executed would be "echo
hi there %yyy". If two qualifiers are sent,
"escape[xxx=hi][yyy=there]", the command executed would be
"echo hi there". If a qualifier is sent with no matching token
in the command line, "escape[zzz=snafu]", an error is
reported.
- size[fs=<FS>]
- Specifies that the available and configured disk space in the <FS>
filesystem is to be reported to the pbs_server and scheduler. NOTE: To
request disk space on a per job basis, specify the file resource as in
'qsub -l nodes=1,file=1000kb' For example, the available and configured
disk space in the /localscratch filesystem will be reported:
- size[fs=/localscratch]
- Initialization Value
- An initialization value directive has a name which starts with a dollar
sign ($) and must be known to MOM via an internal table. The entries in
this table now are:
- auto_ideal_load
- if jobs are running, sets idea_load based on a simple expression. The
expressions start with the variable 't' (total assigned CPUs) or 'c'
(existing CPUs), an operator (+ - / *), and followed by a float
constant.
- $auto_ideal_load t-0.2
- auto_max_load
- if jobs are running, sets max_load based on a simple expression. The
expressions start with the variable 't' (total assigned CPUs) or 'c'
(existing CPUs), an operator (+ - / *), and followed by a float
constant.
- cputmult
- which sets a factor used to adjust cpu time used by a job. This is
provided to allow adjustment of time charged and limits enforced where the
job might run on systems with different cpu performance. If Mom's system
is faster than the reference system, set cputmult to a decimal value
greater than 1.0. If Mom's system is slower, set cputmult to a value
between 1.0 and 0.0. For example:
- $cputmult 1.5
$cputmult 0.75
- configversion
- specifies the version of the config file data, a string.
- check_poll_time
- specifies the MOM interval in seconds. MOM checks each job for updated
resource usages, exited processes, over-limit conditions, etc. once per
interval. This value should be equal or lower to pbs_server's
job_stat_rate. High values result in stale information reported to
pbs_server. Low values result in increased system usage by MOM. Default is
45 seconds.
- down_on_error
- causes MOM to report itself as state "down" to pbs_server in the
event of a failed health check. This feature is EXPERIMENTAL and likely to
be removed in the future. See HEALTH CHECK below.
- enablemomrestart
- enable automatic restarts of MOM. If enabled, MOM will check if its binary
has been updated and restart itself at a safe point when no jobs are
running; thus making upgrades easier. The check is made by comparing the
mtime of the pbs_mom executable. Command-line args, the process name, and
the PATH env variable are preserved across restarts. It is recommended
that this not be enabled in the config file, but enabled when desired with
momctl (see RESOURCES for more information.)
- ideal_load
- ideal processor load. Represents a low water mark for the load average.
Nodes that are currently busy will consider itself free after falling
below ideal_load.
- igncput
- Ignore cpu time violations on this mom, meaning jobs will not be cancelled
due to exceeding their limits for cpu time.
- ignmem
- Ignore memory violations on this mom, meaning jobs will not be cancelled
due to exceeding their memory limits.
- ignvmem
- If set to true, then pbs_mom will ignore vmem/pvmem limit
enforcement.
- ignwalltime
- If set to true, then pbs_mom will ignore walltime limit enforcement.
- job_output_file_mask
- Specifies a mask for creating job output and error files. Values can be
specified in base 8, 10, or 16; leading 0 implies octal and leading 0x or
0X hexadecimal. A value of "userdefault" will use the user's
default umask. $job_output_file_mask 027
- log_directory
- Changes the log directory. Default is $TORQUEHOME/mom_logs/. $TORQUEHOME
default is /var/spool/torque/ but can be changed in the ./configure
script. The value is a string and should be the full path to the desired
mom log directory. $log_directory /opt/torque/mom_logs/
- logevent
- which sets the mask that determines which event types are logged by
pbs_mom. For example:
- $logevent 0x1fff
$logevent 255
- The first example would set the log event mask to 0x1ff (511) which
enables logging of all events including debug events. The second example
would set the mask to 0x0ff (255) which enables all events except debug
events.
- log_file_suffix
- Optional suffix to append to log file names. If %h is the suffix, pbs_mom
appends the hostname for where the log files are stored if it knows it,
otherwise it will append the hostname where the mom is running.
$log_file_suffix tom = 20100223.tom
- log_keep_days
- Specifies how many days to keep log files. pbs_mom deletes log files older
than the specified number of days. If not specified, pbs_mom won't delete
log files based on their age.
- loglevel
- specifies the verbosity of logging with higher numbers specifying more
verbose logging. Values may range between 0 and 7.
- log_file_max_size
- If this is set to a value > 0 then pbs_mom will roll the current log
file to log-file-name.1 when its size is greater than or equal to the
value of log_file_max_size. This value is interpreted as kilobytes.
- log_file_roll_depth
- If this is set to a value >=1 and log_file_max_size is set then pbs_mom
will continue rolling the log files to
log-file-name.log_file_roll_depth.
- max_load
- maximum processor load. Nodes over this load average are considered busy
(see ideal_load above).
- node_check_script
- specifies the fully qualified pathname of the health check script to run
(see HEALTH CHECK for more information).
- node_check_interval
- specifies when to run the MOM health check. The check can be either
periodic, event-driver, or both. The value starts with an integer
specifying the number of MOM intervals between subsequent executions of
the specified health check. After the integer is an optional
comma-separated list of event names. Currently supported are
"jobstart" and "jobend". This value defaults to 1 with
no events indicating the check is run every MOM interval. (see HEALTH
CHECK for more information)
- $node_check_interval 0 #Disabled.
$node_check_interval 0,jobstart #Only runs at job starts
$node_check_interval 10,jobstart,jobend
- nodefile_suffix
- Specifies the suffix to append to a host names to denote the data channel
network adapter in a multihomed compute node. $nodefile_suffix i
With the suffix of 'i' and the control channel adapter with the name
node01, the data channel would have a hostname of node01i.
- nospool_dir_list
- If the job's output file should be in one of the paths specified here,
then it will be spooled directly in that directory instead of the normal
spool directory.
Specified in the format path1, path2, etc.
$nospool_dir_list/home/mike/*,/var/tmp/spool/
- pbsclient
- which causes a host name to be added to the list of hosts which will be
allowed to connect to MOM as long as they are using a privilaged port for
the purposes of resource monitor requests. For example, here are two
configuration file lines which will allow the hosts "fred" and
"wilma" to connect:
- $pbsclient fred
$pbsclient wilma
- Two host name are always allowed to connection to pbs_mom,
"localhost" and the name returned to pbs_mom by the system call
gethostname(). These names need not be specified in the configuration
file. The hosts listed as "clients" can issue Resource Monitor
(RM) requests. Other MOM nodes and servers do not need to be listed as
clients.
- pbsserver
- which defines hostnames running pbs_server that will be allowed to submit
jobs, issue Resource Monitor (RM) requests, and get status updates. MOM
will continually attempt to contact all server hosts for node status and
state updates. Like $PBS_SERVER_HOME/server_name, the hostname may be
followed by a colon and a port number. This parameter replaces the
oft-confused $clienthost parameter from TORQUE 2.0.0p0 and earlier. Note
that the hostname in $PBS_SERVER_HOME/server_name is used if no $pbsserver
parameters are found
- prologalarm
- Specifies maximum duration (in seconds) which the MOM will wait for the
job prolog or job job epilog to complete. This parameter default to 300
seconds (5 minutes)
- rcpcmd
- Specify the the full path and argument to be used for remote file copies.
This overrides the compile-time default found in configure. This must
contain 2 words: the full path to the command and the switches. The copy
command must be able to recursively copy files to the remote host and
accept arguments of the form "user@host:files" For example:
- $rcpcmd /usr/bin/rcp -rp
$rcpcmd /usr/bin/scp -rpB
- restricted
- which causes a host name to be added to the list of hosts which will be
allowed to connect to MOM without needing to use a privilaged port. These
names allow for wildcard matching. For example, here is a configuration
file line which will allow queries from any host from the domain
"ibm.com".
- $restricted *.ibm.com
- The restriction which applies to these connections is that only internal
queries may be made. No resources from a config file will be found. This
is to prevent any shell commands from being run by a non-root process.
This parameter is generally not required except for some versions of
OSX.
- remote_checkpoint_dirs
- Specifies what server checkpoint directories are remotely mounted. This
directive is used to tell the MOM which directories are shared with the
server. Using remote checkpoint directories eliminates the need to copy
the checkpoint files back and forth between the MOM and the server. This
parameter is available in 2.4.1 and later.
- $remote_checkpoint_dirs /var/spool/torque/checkpoint
- remote_reconfig
- Enables the ability to remotely reconfigure pbs_mom with a new config
file. Default is disabled. This parameter accepts various forms of true,
yes, and 1.
- source_login_batch
- Specifies whether or not mom will source the /etc/profile, etc. type files
for batch jobs. Parameter accepts various forms of true, false, yes, no, 1
and 0. Default is True.
- source_login_interactive
- Specifies whether or not mom will source the /etc/profile, etc. type files
for interactive jobs. Parameter accepts various forms of true, false, yes,
no, 1 and 0. Default is True.
- spool_as_final_name
- If set to true, jobs will spool directly as their output files, with no
intermediate locations or steps. This is mostly useful for shared
filesystems with fast writing capability.
- status_update_time
- Specifies (in seconds) how often MOM updates its status information to
pbs_server. This value should correlate with the server's scheduling
interval. High values increase the load of pbs_server and the network. Low
values cause pbs_server to report stale information. Default is 45
seconds.
- tmpdir
- Sets the directory basename for a per-job temporary directory. Before job
launch, MOM will append the jobid to the tmpdir basename and create the
directory. After the job exit, MOM will recursively delete it. The env
variable TMPDIR will be set for all pro/epilog scripts, the job script,
and TM tasks.
Directory creation and removal is done as the job owner and group, so the
owner must have write permission to create the directory. If the directory
already exists and is owned by the job owner, it will not be deleted after
the job. If the directory already exists and is NOT owned by the job
owner, the job start will be rejected.
- timeout
- Specifies the number of seconds before TCP messages will time out. TCP
messages include job obituaries, and TM requests if RPP is disabled.
Default is 60 seconds.
- usecp
- specifies which directories should be staged with cp instead of rcp/scp.
If a shared filesystem is available on all hosts in a cluster, this
directive is used to make these filesystems known to MOM. For example, if
/home is NFS mounted on all nodes in a cluster:
- $usecp *:/home /home
- varattr
- This is similar to a shell escape above, but includes a TTL. The command
will only be run every TTL seconds. A TTL of -1 will cause the command to
be executed only once. A TTL of 0 will cause the command to be run
everytime varattr is requested. This parameter may be used multiple times,
but all output will be grouped into a single "varattr" attribute
in the request and status output. The command should output data in the
form of varattrname=va1ue1[+value2]...
- $varattr 3600 /path/to/script [<ARGS>]...
- wallmult
- which sets a factor used to adjust wall time usage by to job to a common
reference system. The factor is used for walltime calculations and limits
the same as cputmult is used for cpu time.
The configuration file must be executable and "secure".
It must be owned by a user id and group id less than 10 and not be world
writable. Output from this file must be in the format $VAR=$VAL, i.e.,
- dataset13=20070104
dataset22=20070202
viraltest=abdd3
- xauthpath
- Specifies the path to the xauth binary to enable X11 fowarding.
- mom_host
- Sets the local hostname as used by pbs_mom.
Resource Monitor queries can be made with momctl's -q option to retrieve and set
pbs_mom options. Any configured static resource may be retrieved with a
request of the same name. These are resource requests not otherwise documented
in the PBS ERS.
- cycle
- forces an immediate MOM cycle
- status_update_time
- retrieve or set the $status_update_time parameter
- check_poll_time
- retrieve or set the $check_poll_time parameter
- configversion
- retrieve the config version
- jobstartblocktime
- retrieve or set the $jobstartblocktime parameter
- enablemomrestart
- retrieve or set the $enablemomrestart parameter
- loglevel
- retrieve or set the $loglevel parameter
- down_on_error
- retrieve or set the EXPERIMENTAL $down_on_error parameter
- diag0 - diag4
- retrieves various diagnostic information
- rcpcmd
- retrieve or set the $rcpcmd parameter
- version
- retrieves the pbs_mom version
The health check script is executed directly by the pbs_mom daemon under the
root user id. It must be accessible from the compute node and may be a script
or compiled executable program. It may make any needed system calls and
execute any combination of system utilities but should not execute resource
manager client commands. Also, as of TORQUE 1.0.1, the pbs_mom daemon blocks
until the health check is completed and does not possess a built-in timeout.
Consequently, it is advisable to keep the launch script execution time short
and verify that the script will not block even under failure conditions.
If the script detects a failure, it should return the keyword
'ERROR' to stdout followed by an error message. The message (up to 256
characters) immediately following the ERROR string will be assigned to the
node attribute 'message' of the associated node.
If the script detects a failure when run from
"jobstart", then the job will be rejected. This should probably
only be used with advanced schedulers like Moab so that the job can be
routed to another node.
TORQUE currently ignores ERROR messages by default, but advanced
schedulers like moab can be configured to react appropriately.
If the experimental $down_on_error MOM setting is enabled, MOM
will set itself to state down and report to pbs_server; and pbs_server will
report the node as "down". Additionally, the experimental
"down_on_error" server attribute can be enabled which has the same
effect but moves the decision to pbs_server. It is redundant to have MOM's
$down_on_error and pbs_server's down_on_error features enabled. See
"down_on_error" in pbs_server_attributes(7B).
- $PBS_SERVER_HOME/server_name
- contains the hostname running pbs_server.
- $PBS_SERVER_HOME/mom_priv
- the default directory for configuration files, typically
(/usr/spool/pbs)/mom_priv.
- $PBS_SERVER_HOME/mom_logs
- directory for log files recorded by the server.
- $PBS_SERVER_HOME/mom_priv/prologue
- the administrative script to be run before job execution.
- $PBS_SERVER_HOME/mom_priv/epilogue
- the administrative script to be run after job execution.
pbs_mom handles the following signals:
- SIGHUP
- causes pbs_mom to re-read its configuration file, close and reopen the log
file, and reinitialize resource structures.
- SIGALRM
- results in a log file entry. The signal is used to limit the time taken by
certain children processes, such as the prologue and epilogue.
- SIGINT and SIGTERM
- results in pbs_mom exiting without terminating any running jobs. This is
the action for the following signals as well: SIGXCPU, SIGXFSZ, SIGCPULIM,
and SIGSHUTDN.
- SIGUSR1, SIGUSR2
- causes MOM to increase and decrease logging levels, respectively.
- SIGPIPE, SIGINFO
-
are ignored.
- SIGBUS, SIGFPE, SIGILL, SIGTRAP, and SIGSYS
- cause a core dump if the PBSCOREDUMP environmental variable is
defined.
All other signals have their default behavior installed.
If the mini-server command fails to begin operation, the server exits with a
value greater than zero.
pbs_server(8B), pbs_scheduler_basl(8B), pbs_scheduler_tcl(8B), the PBS External
Reference Specification, and the PBS Administrator's Guide.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
|