NAME

nvidia-cuda-mps-control - NVIDIA CUDA Multi Process Service management program

SYNOPSIS

nvidia-cuda-mps-control [-d | -f]

DESCRIPTION

MPS is a runtime service designed to let multiple MPI processes using CUDA to run concurrently in a way that's transparent to the MPI program. A CUDA program runs in MPS mode if the MPS control daemon is running on the system.

When CUDA is first initialized in a program, the CUDA driver attempts to connect to the MPS control daemon. If the connection attempt fails, the program continues to run as it normally would without MPS. If however, the connection attempt to the control daemon succeeds, the CUDA driver then requests the daemon to start an MPS server on its behalf. If there's an MPS server already running, and the user id of that server process matches that of the requesting client process, the control daemon simply notifies the client process of it, which then proceeds to connect to the server. If there's no MPS server already running on the system, the control daemon launches an MPS server with the same user id (UID) as that of the requesting client process. If there's an MPS server already running, but with a different user id than that of the client process, the control daemon requests the existing server to shutdown as soon as all its clients are done. Once the existing server has terminated, the control daemon launches a new server with the user id same as that of the queued client process.

The MPS server creates the shared GPU context, and manages its clients. An MPS server can support a finite amount of CUDA contexts determined by the hardware architecture it is running on. For compute capability SM 3.5 through SM 6.0 the limit is 16 clients per GPU at a time. Compute capability SM 7.0 has a limit of 48. MPS is transparent to CUDA programs, with all the complexity of communication between the client process, the server and the control daemon hidden within the driver binaries.

Currently, CUDA MPS is available on 64-bit Linux only, requires a device that supports Unified Virtual Address (UVA) and has compute capability SM 3.5 or higher. Applications requiring pre-CUDA 4.0 APIs are not supported under CUDA MPS. Certain capabilities are only available starting with compute capability SM 7.0.

Refer to the MPS documentation on NVIDiA Docs for more details.

OPTIONS

-d

Start the MPS control daemon in background mode, assuming the user has enough privilege (e.g. root). Parent process exits when control daemon started listening for client connections.

-f

Start the MPS control daemon in foreground mode, assuming the user has enough privilege (e.g. root). The debug messages are sent to standard output.

-multiuser-server

Relax the single UID user requirement, allowing any UID to connect and share an MPS server started as root.

-h, --help

Print a help message.

<no arguments>

Start the front-end management user interface to the MPS control daemon, which needs to be started first. The front-end UI keeps reading commands from stdin until EOF. Commands are separated by the newline character. If an invalid command is issued and rejected, an error message will be printed to stdout. The exit status of the front-end UI is zero if communication with the daemon is successful. A non-zero value is returned if the daemon is not found or connection to the daemon is broken unexpectedly. See the "quit" command below for more information about the exit status.

Commands supported by the MPS control daemon:

get_server_list: Print out a list of PIDs of all MPS servers.
get_server_status PID: Print out the status of the server with given (PID).
start_server -uid UID: Start a new MPS server for the specified user (UID).
shutdown_server PID [-f]: Shutdown the MPS server with given PID. MPS server only exits after all clients disconnect and the MPS server may accept new clients while there is a connected client. -f is forced immediate shutdown. If a client launches a faulty kernel that runs forever, a forced shutdown of the MPS server may be required, since the MPS server creates and issues GPU work on behalf of its clients.
get_client_list PID: Print out a list of PIDs of all clients connected to the MPS server with given PID.
quit [-t TIMEOUT]: Shutdown the MPS control daemon process and all MPS servers. The MPS control daemon stops accepting new clients while waiting for current MPS servers and MPS clients to finish. If TIMEOUT is specified (in seconds), the daemon will force MPS servers to shutdown if they are still running after TIMEOUT seconds.
This command is synchronous. The front-end UI waits for the daemon to shutdown, then returns the daemon's exit status. The exit status is zero iff all MPS servers have exited gracefully.
Commands available to Volta MPS control daemon:
get_device_client_list PID: List the devices and PIDs of client applications that enumerated this device. It optionally takes the server instance PID.
set_default_active_thread_percentage percentage: Set the default active thread percentage for MPS servers. If there is already a server spawned, this command will only affect the next server. The set value is lost if a quit command is executed. The default is 100.
get_default_active_thread_percentage: Query the current default available thread percentage.
set_active_thread_percentage PID percentage: Set the active thread percentage for the MPS server instance of the given PID. All clients created with that server afterwards will observe the new limit. Existing clients are not affected.
get_active_thread_percentage PID: Query the current available thread percentage of the MPS server instance of the given PID.
set_default_device_pinned_mem_limit dev value: Sets the default device pinned memory limit for for the MPS servers. If there is already a server spawned, this command will only affect the next server. The value must be in the form of an integer followed by a qualifier, either “G” or “M” that specifies the value in Gigabytes or Megabytes respectively.
get_default_device_pinned_mem_limit dev: Query the current default pinned memory limit for the device.
set_device_pinned_mem_limit PID dev value: Overrides the device pinned memory limit for the MPS server of the given PID. All the clients created with that server afterwards will observe the new limit. Existing clients are not affected.
get_default_device_pinned_mem_limit PID dev: Query the current device pinned memory limit of the MPS server instance of the given PID for the device dev.
terminate_client server PID client PID: Terminates all the outstanding GPU work of the MPS client process of the given client PID running on the MPS server denoted by <server PID>.
ps [-p PID]: Reports a snapshot of the current client processes.
set_default_client_priority priority: Set the client priority level that will be used for new clients. Priority values are only considered as hints to the CUDA driver and can be ignored or overriden depending on platform. priority follows a convention where smaller numbers are higher priorities, and the default priority value is 0. The only other supported value for priority is 1, which represents a below-normal priority level.
get_default_client_priority: Query the current priority value that will be used for new clients.

ENVIRONMENT

CUDA_MPS_PIPE_DIRECTORY: Specify the directory that contains the named pipes and UNIX domain sockets used for communication among the MPS control, MPS server, and MPS clients. The value of this environment variable should be consistent in the MPS control daemon and all MPS client processes. Default directory is /tmp/nvidia-mps
CUDA_MPS_LOG_DIRECTORY: Specify the directory that contains the MPS log files. This variable is used by the MPS control daemon only. Default directory is /var/log/nvidia-mps
CUDA_VISIBLE_DEVICES: Specify which CUDA devices are visible to the control daemon. Takes either an index value or UUID.
CUDA_DEVICE_MAX_CONNECTIONS: Specify the preferred number of connections between the host and device.
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE: Specify the portion of available threads that clients can use on Volta+ hardware. Setting this for the control daemon will set the default active thread percentage for all servers spawned, while setting this for a client or client context will constrain the active thread percentage for that unit and cannot be higher than the active thread percentage value set for the control daemon.
CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING: Specify whether individual client contexts are allowed to have different values for CUDA_MPS_ACTIVE_THREAD_PERCENTAGE.
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT: Specify the amount of GPU memory available to MPS client processes.
CUDA_MPS_CLIENT_PRIORITY: Specify the default client priority value at initialization.

FILES

Log files created by the MPS control daemon in the specified directory

control.log: Record startup and shutdown of MPS control daemon, user commands issued with their results, and status of MPS servers.
server.log: Record startup and shutdown of MPS servers, and status of MPS clients.