|
|
| |
pbs_scheduler_basl(8B) |
PBS |
pbs_scheduler_basl(8B) |
pbs_sched_basl - pbs BASL scheduler
pbs_sched [-d home] [-L logfile] [-p print_file] [-a alarm] [-S port] [-c
configfile]
The pbs_sched command starts the operation of a batch scheduler on the
local host. It runs in conjunction with the PBS server. It queries the server
about the state of PBS and communicates with pbs_mom to get information
about the status of running jobs, memory available etc. It then makes
decisions as to what jobs to run.
Typically, this command will be in a local boot file such as
/etc/rc.local .
pbs_sched must be executed with root permission.
- -d home
- Specifies the name of the PBS home directory, PBS_HOME. If not specified,
the value of $PBS_SERVER_HOME as defined at compile time is used. Also see
the -L option.
- -L logfile
- Specifies an absolute path name of the file to use as the log file. If not
specified, the scheduler will open a file named for the current date in
the PBS_HOME/sched_logs directory. See the -d option.
- -p print_file
- This specifies the "print" file. Any output from the scheduler
code which is written to standard out or standard error will be written to
this file. If this option is not given, the file used will be
$PBS_HOME/sched_priv/sched_out. See the -d option.
- -a alarm
- This specifies the time in seconds to wait for a schedule run to finish.
If a scheduling iteration takes too long to finish, an alarm signal is
sent, and the scheduler is restarted. If a core file does not exist in the
current directory, abort() is called and a core file is generated. The
default for alarm is 180 seconds.
- -S port
- Specifies a port on which to talk to the server. This option is not
required. It merely overides the default PBS scheduler port.
- -c configfile
- Specify a configuration file, see description below. If this is a relative
file name it will be relative to PBS_HOME/sched_priv, see the -d option.
If the -c option is not supplied, pbs_sched will not attempt to open a
configuration file. In BASL, this config file is almost always needed
because it is where the list of servers, nodes, and host resource queries
are specified by the administrator.
This version of the scheduler requires knowledge of the BASL language. The site
must first write a function called sched_main() (and all functions
supporting it) using BASL constructs, and then translate the functions into C
using the BASL compiler basl2c , which would also attach a main program
to the resulting code. This main program performs general initialization and
housekeeping chores such as setting up local socket to communicate with the
server running on the same machine, cd-ing to the priv directory, opening log
files, opening configuration file (if any), setting up locks, forking the
child to become a daemon, initializing a scheduling cycle (i.e. get node
attributes that are static in nature), setting up the signal handlers,
executing global initialization assignment statements specified by the
scheduler writer, and finally sitting on a loop waiting for a scheduling
command from the server. When the server sends the scheduler an appropriate
scheduling command {SCH_SCHEDULE_NEW, SCH_SCHEDULE_TERM, SCH_SCHEDULE_TIME,
SCH_SCHEDULE_RECYC, SCH_SCHEDULE_CMD, SCH_SCHEDULE_FIRST}, information
about server(s), jobs, queues, and execution host(s) are obtained, and then
sched_main() is called.
The BAtch Scheduling Language (BASL) is a C-like procedural language. It
provides a number of constructs and predefined functions that facilitate
dealing with scheduling issues. Information about a PBS server, the queues
that it owns, jobs residing on each queue, and the computational nodes where
jobs can be run, are accessed via the BASL data types Server, Que, Job, CNode,
Set Server, Set Que, Set Job, and Set CNode.
The following simple sched_main() will cause the server to run all
queued jobs on the local server:
-
sched_main()
{
Server s;
Que q;
Job j;
Set Que queues;
Set Job jobs;
s = AllServersLocalHostGet(); // get local server
queues = ServerQueuesGet(s);
foreach( q in queues ) {
jobs = QueJobsGet(q);
foreach( j in jobs ) {
JobAction(j, SYNCRUN, NULLSTR);
}
}
}
For a more complete discussion of the Batch Scheduler Language,
see basl2c(1B) .
A configuration file may be specified with the -c option. This file is used to
specify the (1) hosts which are allowed to connect to pbs_sched, (2) the list
of server hosts for which the scheduler writer wishes the system to
periodically check for status, queues, and jobs info, (3) list of execution
hosts for which the scheduler writer wants the system to periodically check
for information like state, property, and so on, and (4) various queries to
send to each execution host.
- (1) specifying client hosts:
- The hosts allowed to connect to pbs_sched are specified in the
configuration file in a manner identical to that used in pbs_mom. There is
one line per host using the syntax:
$clienthost hostname
where clienthost and hostname are separated by
white space. Two host names are always allowed to connection to
pbs_sched: "localhost" and the name returned to pbs_sched by
the system call gethostname(). These names need not be specified in the
configuration file.
- (2) specifying list of servers:
- The list of servers is specified in a one host per line manner, using the
syntax:
$serverhost hostname port_number
or where $server_host, hostname, and port_number are
separated by white space.
If port_number is 0, then the default PBS server port
will be used.
Regardless of what has been specified in the file, the list of
servers will always include the local server - one running on the same
host where the scheduler is running.
Within the BASL code, access to data of the list of servers is
done by calling AllServersGet(), or
AllServersLocalHostGet() which returns the local server on the
list.
- (3) specifying the list of execution hosts:
- The list of execution hosts (nodes), whose MOMs are to be queried from the
scheduler, is specified in a one host per line manner, using the syntax:
$momhost hostname port_number
where $momhost, hostname, and port_number
are separated by white space.
If port_number is 0, then the default PBS MOM port will
be used.
The BASL function AllNodesGet() , or
ServerNodesGet(AllServersLocalHostGet()) is available for getting
the list of nodes known to the local system.
- (4) specifying the list of host resources:
- For specifying the list of host resource queries to send to each execution
host's MOM, the following syntax is used:
$node node_name CNode..Get host_resource
node_name should be the same hostname string that was
specified in a $momhost line. A node_name value of
"*" (wildcard) means to match any node.
Please consult section 9 of the PBS ERS (Resource
Monitor/Resources) for a list of possible values to host_resource
parameter.
CNode..Get refers to the actual function name that is
called from the scheduler code to obtain the return values to host
resource queries. The list of CNode..Get function names that can
appear in the configuration file are: .BP
STATIC:
================================
CNodePropertiesGet
CNodeVendorGet
CNodeNumCpusGet
CNodeOsGet
CNodeMemTotalGet[type]
CNodeNetworkBwGet[type]
CNodeSwapSpaceTotalGet[name]
CNodeDiskSpaceTotalGet[name]
CNodeDiskInBwGet[name]
CNodeDiskOutBwGet[name]
CNodeTapeSpaceTotalGet[name]
CNodeTapeInBwGet[name]
CNodeTapeOutBwGet[name]
CNodeSrfsSpaceTotalGet[name]
CNodeSrfsInBwGet[name]
CNodeSrfsOutBwGet[name]
DYNAMIC:
================================
CNodeIdletimeGet
CNodeLoadAveGet
CNodeMemAvailGet[type]
CNodeSwapSpaceAvailGet[name]
CNodeSwapInBwGet[name]
CNodeSwapOutBwGet[name]
CNodeDiskSpaceReservedGet[name]
CNodeDiskSpaceAvailGet[name]
CNodeTapeSpaceAvailGet[name]
CNodeSrfsSpaceReservedGet[name]
CNodeSrfsSpaceAvailGet[name]
CNodeCpuPercentIdleGet
CNodeCpuPercentSysGet
CNodeCpuPercentUserGet
CNodeCpuPercentGuestGet
STATIC function names return values that are obtained only
during the first scheduling cycle, or when the scheduler is instructed
to reconfig; whereas, DYNAMIC function names return attribute values
that are taken at every subsequent scheduling cycle.
name and type are arbitrarily defined. For
example, you can choose to have name defined as
"$FASTDIR" for the CNodeSrfs* calls, and a sample
configuration file entry would look like:
$node unicos8 CNodeSrfsSpaceAvailGet[$FASTDIR]
quota[type=ares_avail,dir=$FASTDIR]
So in a BASL code, if you call CNodeSrfsSpaceAvailGet(node,
"$FASTDIR"), then it will return the value to the query
"quota[type=ares_avail,dir=$FASTDIR]" (3rd parameter) as sent
to the node's MOM.
By default, the scheduler has already internally defined the
following mappings, which can be overriden in the configuration
file:
keyword node_name CNode..Get host_resource
======= ========= ================ =============
$node * CNodeOsGet arch
$node * CNodeLoadAveGet loadave
$node * CNodeIdletimeGet idletime
The above means that for all declared nodes (via $momhost),
the host queries arch, loadave, and idletime will
be sent to each node's MOM. The value to arch is obtained
internally by the system during the first scheduling cycle because it
falls under STATIC category, while values to loadave and
idletime are taken at every scheduling iteration because they
fall under the DYNAMIC category. Access to the return values is done by
calling CNodeOsGet(node), CNodeLoadAveGet(node), and
CNodeIdletimeGet(node), respectively. The following are some
sample $node arguments that you may put in the configuration file.
.BP
node_name CNode..Get host res
================== ========================= ==========
<sunos4_nodename> CNodeIdletimeGet idletime
<sunos4_nodename> CNodeLoadAveGet loadave
<sunos4_nodename> CNodeMemTotalGet[real] physmem
<sunos4_nodename> CNodeMemTotalGet[virtual] totmem
<sunos4_nodename> CNodeMemAvailGet[virtual] availmem
<irix5_nodename> CNodeNumCpusGet ncpus
<irix5_nodename> CNodeMemTotalGet[real] physmem
<irix5_nodename> CNodeMemTotalGet[virtual] totmem
<irix5_nodename> CNodeIdletimeGet idletime
<irix5_nodename> CNodeLoadAveGet loadave
<irix5_nodename> CNodeMemAvailGet[virtual] availmem
<linux_nodename> CNodeNumCpusGet ncpus
<linux_nodename> CNodeMemTotalGet[real] physmem
<linux_nodename> CNodeMemTotalGet[virtual] totmem
<linux_nodename> CNodeIdletimeGet idletime
<linux_nodename> CNodeLoadAveGet loadave
<linux_nodename> CNodeMemAvailGet[virtual] availmem
<solaris5_nodename> CNodeIdletimeGet idletime
<solaris5_nodename> CNodeLoadAveGet loadave
<solaris5_nodename> CNodeNumCpusGet ncpus
<solaris5_nodename> CNodeMemTotalGet[real] physmem
<aix4_nodename> CNodeIdletimeGet idletime
<aix4_nodename> CNodeLoadAveGet loadave
<aix4_nodename> CNodeMemTotalGet[virtual] totmem
<aix4_nodename> CNodeMemAvailGet[virtual] availmem
<unicos8_nodename> CNodeIdletimeGet idletime
<unicos8_nodename> CNodeLoadAveGet loadave
<unicos8_nodename> CNodeNumCpusGet ncpus
<unicos8_nodename> CNodeMemTotalGet[real] physme
<unicos8_nodename> CNodeMemAvailGet[virtual] availmem
<unicos8_nodename> CNodeSwapSpaceTotalGet[primary] swaptotal
<unicos8_nodename> CNodeSwapSpaceAvailGet[primary] swapavail
<unicos8_nodename> CNodeSwapInBwGet[primary] swapinrate
<unicos8_nodename> CNodeSwapOutBwGet[primary] swapoutrate
<unicos8_nodename> CNodePercentIdleGet cpuidle
<unicos8_nodename> CNodePercentSysGet cpuunix
<unicos8_nodename> CNodePercentGuestGet cpuguest
<unicos8_nodename> CNodePercentUsrGet cpuuser
<unicos8_nodename> CNodeSrfsSpaceAvailGet[$FASTDIR] quota[type
=ares_avail,
dir=$FASTDIR]
<unicos8_nodename> CNodeSrfsSpaceAvailGet[$BIGDIR] quota[type
=ares_avail,
dir=$BIGDIR]
<unicos8_nodename> CNodeSrfsSpaceAvailGet[$WRKDIR] quota[type
=ares_avail,
dir=$WRKDIR]
<sp2_nodename> CNodeLoadAveGet loadave
Suppose you have an execution host that is of irix5 os type,
then the <irix5_node_name> entries will be consulted by the
scheduler. The initial scheduling cycle would involve sending the STATIC
queries ncpus, physmem, totmem to the execution
host's MOM, and access to return values of the queries is done via
CNodeNumCpusGet(node), CNodeMemTotalGet(node,
"real"), CNodeMemTotalGet(node,
"virtual") respectively, where node is the CNode
representation of the execution host. The subsequent scheduling cycles
will only send DYNAMIC queries idletime, loadave, and
availmem, and access to the return values of the queries is done
via CNodeIdleTimeGet(node), CNodeLoadAveGet(node),
CNodeMemAvailGet(node, "virtual"). respectively.
"Later" entries in the config file take precedence.
The configuration file must be "secure". It must be
owned by a user id and group id less than 10 and not be world writable.
On receipt of a SIGHUP signal, the scheduler will close and reopen
its log file and reread its configuration file (if any).
- $PBS_SERVER_HOME/sched_priv
- the default directory for configuration files, typically
(/usr/spool/pbs)/sched_priv.
A C based scheduler will handle the following signals:
- SIGHUP
- The server will close and reopen its log file and reread the config file
if one exists.
- SIGALRM
- If the site supplied scheduling module exceeds the time limit, the Alarm
will cause the scheduler to attempt to core dump and restart itself.
- SIGINT and SIGTERM
- Will result in an orderly shutdown of the scheduler.
All other signals have the default action installed.
Upon normal termination, an exit status of zero is returned.
basl2c(1B), pbs_sched_tcl(8B), pbs_server(8B), and pbs_mom(8B).
PBS Internal Design Specification
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |