|
|
| |
Net::OpenSSH::Parallel(3) |
User Contributed Perl Documentation |
Net::OpenSSH::Parallel(3) |
Net::OpenSSH::Parallel - Run SSH jobs in parallel
use Net::OpenSSH::Parallel;
my $pssh = Net::OpenSSH::Parallel->new();
$pssh->add_host($_) for @hosts;
$pssh->push('*', scp_put => '/local/file/path', '/remote/file/path');
$pssh->push('*', command => 'gurummm',
'/remote/file/path', '/tmp/output');
$pssh->push($special_host, command => 'prumprum', '/tmp/output');
$pssh->push('*', scp_get => '/tmp/output', 'logs/%HOST%/output');
$pssh->run;
Run this here, that there, etc.
"Net::OpenSSH::Parallel" is an
scheduler that can run commands in parallel in a set of hosts through SSH.
It tries to find a compromise between being simple to use, efficient and
covering a good part of the problem space of parallel process execution via
SSH.
Obviously, it is build on top of Net::OpenSSH!
Common usage of the module is as follows:
- create a "Net::OpenSSH::Parallel"
object
- register the hosts where you want to run commands with the
"add_host" method
- queue the actions you want to run (commands, file copy operations, etc.)
using the "push" method.
- call the "run" method and let the parallel scheduler take care
of everything!
Every host is identified by an unique label that is given when the host is
registered into the parallel scheduler. Usually, the host name is used also as
the label, but this is not required by the module.
The rationale behind using labels is that a hostname does not
necessarily identify unique "remote processors" (for instance,
sometimes your logical "remote processors" may be user accounts
distributed over a set of hosts:
"foo1@bar1",
"foo2@bar1",
"foo3@bar2", ...; a set of hosts that are
accessible behind an unique IP, listening in different ports; etc.)
Several of the methods of this module (well, currently, just
"push") accept a selector string to
determine which of the registered hosts should be affected by the operation.
For instance, in...
$pssh->push('*', command => 'ls')
the first argument is the selector. The one used here,
"*", selects all the registered hosts.
Other possible selectors are:
'bar*' # selects everything beginning by 'bar'
'foo1,foo3,foo6' # selects the hosts of the given names
'bar*,foo1,foo3,foo6' # both
'*doz*' # everything containing 'doz'
Note: I am still considering how the selector mini-language
should be, do not hesitate to send your suggestions!
When the number of hosts managed by the scheduler is too high, the local node
can become overloaded.
Roughly, every SSH connection requires two local
"ssh" processes (one to run the SSH
connection and another one to launch the remote command) that results in
around 5MB of RAM usage per host.
CPU usage varies greatly depending on the tasks carried out. The
most expensive are short remote tasks (because of the local process creation
and destruction overhead) and tasks that transfer big amounts of data
through SSH (because of the encryption going on).
In practice, CPU usage does not matter too much (mostly because
the OS would be able to manage it but also because there is not too many
things we can do to reduce it) and usually it is RAM about what we should be
more concerned.
The module accepts two parameters to limit resource usage:
In practice, limiting the maximum number of connections indirectly
limits RAM usage and limiting the the maximum number of workers indirectly
limits CPU usage.
The module requires the maximum number of connections to be at
least equal or bigger than the maximum number of workers, and it is
recommended that "maximum_connections >= 2 *
maximum_workers" (otherwise the scheduler will not be able to
reuse connections efficiently).
You will have to experiment to find out which combinations give
the best results in your particular scenarios.
Also, for small sets of hosts you can just let these parameters
unlimited.
This module activates Net::OpenSSH variable expansion by default. That way, it
is possible to easily customize the actions executed on every host in base to
some of its properties.
For instance:
$pssh->push('*', scp_get => "/var/log/messages", "messages.%HOST%");
copies the log files appending the name of the remote hosts to the
local file names.
The variables "HOST",
"USER",
"PORT" and
"LABEL" are predefined.
When something goes wrong (for instance, some host is unreachable, some
connection dies, some command fails, etc.) the module can handle the error in
several predefined ways as follows:
Error policies
To set the error handling police, "new",
"add_host" and "push" methods support and optional
"on_error" argument that can take the
following values (these constants are available from
Net::OpenSSH::Parallel::Constants):
- OSSH_ON_ERROR_IGNORE
- Ignores the error and continues executing tasks in the host queue as it
had never happened.
- OSSH_ON_ERROR_ABORT
- Aborts the processing on the corresponding host. The error will be
propagated to other hosts joining it at any later point once the join is
reached.
In other words, this police aborts the queued jobs for this
host and any other that has a dependency on it.
- OSSH_ON_ERROR_DONE
- Similar to "OSSH_ON_ERROR_ABORT" but
will not propagate errors to other hosts via joins.
- OSSH_ON_ERROR_ABORT_ALL
- Causes all the host queues to be aborted as soon as possible (and that
usually means after currently running actions end).
- OSSH_ON_ERROR_REPEAT
- The module will try to perform the current task again and again until it
succeeds. This police can lead to an infinite loop and so its direct usage
is discouraged (but see the following point about setting the policy
dynamically).
The default policy is
"OSSH_ON_ERROR_ABORT".
Setting the policy dynamically
When a subroutine reference is used as the policy instead of the
any of the constants previously described, the given subroutine will be
called on error conditions as follows:
$on_error->($pssh, $label, $error, $task)
$pssh is a reference to the
"Net::OpenSSH::Parallel" object,
$label is the label associated to the host where the
error happened. $error is the error type as defined
in Net::OpenSSH::Parallel::Constants and $task is a
reference to the task that was being carried out.
The return value of the subroutine must be one of the described
constants and the corresponding policy will be applied.
Retrying connection errors
If the module fails when trying to establish a new SSH connection
or when an existing connection dies unexpectedly, the option
"reconnections" can be used to instruct
the module to retry the connection until it succeeds or the given maximum is
reached.
"reconnections" is accepted by
both the "new" and "add_host" methods.
Example:
$pssh->add_host('foo', reconnections => 3);
Note that the reconnections maximum is not per host but per queued
task.
These are the available methods:
- $pssh = Net::OpenSSH::Parallel->new(%opts)
- creates a new object.
The accepted options are:
- workers => $maximum_workers
- sets the maximum number of operations that can be carried out in parallel
(see "Local resource usage").
- connections => $maximum_connections
- sets the maximum number of SSH connections that can be established
simultaneously (see "Local resource usage").
$maximum_connections must be equal or
bigger than $maximum_workers
- reconnections => $maximum_reconnections
- when connecting to some host fails, this argument tells the module the
maximum number of additional connection attempts that it should perform
before giving up. The default value is zero.
See also "Retrying connection errors".
- on_error => $policy
- Sets the error handling policy (see "Error handling").
- $pssh->add_host($label, %opts)
- $pssh->add_host($label, $host, %opts)
- registers a new host into the $pssh object.
$label is the name used to refer to
the registered host afterwards.
When the hostname argument is omitted, the label is used also
as the hostname.
The accepted options are:
- on_error => $policy
- Sets the error handling policy (see "Error handling").
- reconnections => $maximum_reconnections
- See "Retrying connection errors".
Any additional option will be passed verbatim to the Net::OpenSSH
constructor later. For instance:
$pssh->add_host($host, user => $user, password => $password);
- $pssh->push($selector, $action, \%opts, @action_args)
- $pssh->push($selector, $action, @action_args)
- pushes a new action into the queues selected by
$selector.
The supported actions are:
- command => @cmd
- queue the given shell command on the selected hosts.
Example:
$self->push('*', 'command'
{ stdout_fh => $find_fh, stderr_to_stdout => 1 },
'find', '/my/dir');
- scp_get => @remote, $local
- scp_put => @local, $remote
- These methods queue a "scp" remote file
copy operation in the selected hosts.
- rsync_get => @remote, $local
- rsync_put => @local, $remote
- These methods queue an rsync remote file copy operation in the selected
hosts.
- sub => sub { ... }, @extra_args
- sub { ... }, @extra_args
- Queues a call to a perl subroutine that will be executed locally.
Note that subroutines are executed synchronously in the same
process, so no other task will be scheduled while they are running.
The sub is called as
$sub->($pssh, $label, @extra_args)
where $pssh is the current
Net::OpenSSH::Parallel object.
- parsub => sub { ... }, @extra_args
- Queues a call to a perl subroutine that will be executed locally on a
forked process.
The sub is called as
$sub->($label, $ssh, @extra_args)
Where $ssh is an Net::OpenSSH object
that can be used to interact with the remote machine.
Note that the interface is different to that of the
"sub" action.
An example of usage:
sub sudo_install {
my ($label, $ssh, @pkgs) = @_;
my ($pty) = $ssh->open2pty('sudo', 'apt-get', 'install', @pkgs);
my $expect = Expect->init($pty);
$expect->raw_pty(1);
$expect->expect($timeout, ":");
$expect->send("$passwd\n");
$expect->expect($timeout, "\n");
$expect->raw_pty(0);
while(<$expect>) { print };
close $expect;
}
$pssh->push('*', parsub => \&sudo_install, 'scummvm');
If the subroutine dies or calls
"_exit" with a non zero return code,
the error handling code will be triggered (see "Error
handling").
The "parsub" action accepts
the additional option "no_ssh"
indicating that the $ssh object is not going to
be used. For instance:
$pssh->push('*', parsub => { no_ssh => 1 },
sub {
my $label = shift;
{ exec "gzip", "/tmp/file-$label" };
die "exec failed: $!";
});
That can make the script faster when the maximum number of
simultaneous connections is limited. See "Local resource
usage".
- join => $selector
- Joins allow to synchronize jobs between different servers.
For instance:
$ssh->push('server_B', scp_get => '/tmp/foo', 'foo');
$ssh->push('server_A', join => 'server_B');
$ssh->push('server_A', scp_put => 'foo', '/tmp/foo');
The join makes server_A to wait for the
"scp_get" operation queued in server_B
to finish before proceeding with the
"scp_put".
In general the join will make the selected servers wait for
any task queued on the servers matched by
$selector to finish before proceeding with the
next queued tasks.
One common usage is to synchronize all servers at some
point:
$ssh->push('*', join => '*');
By default, errors are propagated at joins. For instance, in
the example above, if the "scp_get"
operation queued on server_B failed, it would abort any further
operation queued on server_B and any further operation queued after the
join in server_A. See also "Error handling".
- here => $tag
- Push a tag in the stack that can be used as a target for goto
operations.
- goto => $target
- Jumps forward until the given "here" tag
is reached.
Joins to other hosts queues will be ignored, and joins from
other queues to this one will be successfully fulfilled. For
instance:
$pssh->add_host(A => ...);
$pssh->add_host(B => ...);
$pssh->push('*', cmd => 'echo "hello from %HOST"');
$pssh->push('A', goto => 'there');
$pssh->push('A', join => 'B'); # ignored by A on goto
$pssh->push('B', join => 'A'); # fulfilled by A on goto
$pssh->push('*', cmd => 'echo "hello from %HOST% again"');
$pssh->push('*', here => 'there');
$pssh->push('*', cmd => 'echo "bye bye from %HOST%");
Note that it is not possible to jump backwards.
There is an special target
"END" that can be used to jump to the
end of the queue.
- stop
- Discards any additional operations queued. Any pending joins will be
successfully fulfilled.
It is equivalent to
$pssh->push('*', goto => 'END');
- connect
- Just ensures that connecting to the remote machine is possible without
doing any other action.
When given, %opts can contain the
following options:
- on_error => $fail_mode
- on_error => sub { ... }
- See "Error handling".
- or_goto => $tag
- Supported for "command",
"scp_get",
"scp_put",
"rsync_get" and
"rsync_put", when the command,
"scp" or
"rsync" operation fails a
"goto" to the given target is performed.
For instance:
$pssh->all(command => { or_goto => 'no_file' },
"test -f /etc/foo");
$pssh->all(scp_get => "/etc/foo", "/tmp/foo-%LABEL%");
$pssh->all(here => "no_file");
Failures related to SSH errors do not trigger the goto but the
error handling code.
- timeout => $seconds
- not implemented yet!
- on_done => sub { ... }
- not implemented yet!
Any other option will be passed to the corresponding Net::OpenSSH
method (spawn, scp_put, etc.).
- $pssh->all($action => @args)
- $pssh->all($action => \%opts, @args)
- Shortcut for...
$pssh->push('*', $action, \%opts, @args);
- $pssh->run
- Runs the queued operations.
It returns a true value on success and false otherwise.
- $pssh->get_error($label)
- Returns the last error associated to the host of the given label.
- $pssh->get_errors
- In list context returns a list of pairs "$label
=> $error" for the failed queues.
In scalar context returns the number of failed queues.
- Running remote commands with sudo
- Q: I need to run the remote commands with sudo that asks for a
password. How can I do it?
A: First read the answer given to a similar question on
Net::OpenSSH FAQ.
The problem is that Net::OpenSSH::Parallel methods do not
support the <stdin_data> option, so you will have to use an
external file.
$pssh->push('*', cmd => { stdin_file => $passwd_file },
'sudo', '-Skp', '', '--', @cmd);
One trick you can use if you only have one password is to use
the "DATA" file handle:
$pssh->push('*', cmd => { stdin_fh => \*DATA},
'sudo', '-Skp', '', '--', @cmd);
...
# and at the end of your script
__DATA__
this-is-my-remote-password-for-sudo
Or you can also use the
"parsub" action:
my %sudo_passwords = (host1 => "foo", ...);
sub sudo {
my ($label, $ssh, @cmd) = @_;
$ssh->system({stdin_data => "$sudo_passwords{$label}\n"},
'sudo', '-Skp', '', '--', @cmd);
}
$pssh->push('*', parsub => \&sudo, @cmd);
- run N processes per host concurrently
allow running more than one process per remote server
concurrently
- delay before reconnect
when connecting fails, do not try to reconnect immediately but
after some predefined period
- rationalize debugging
currently it is a mess
- add logging support
log the operations performed in a given file
- stdio redirection
add support for better handling of the Net::OpenSSH stdio
redirection facilities
- configurable valid return codes
Non zero exit code is not always an error.
This module should be considered beta quality, everything seems to work but it
may yet contain critical bugs.
If you find any, report it via <http://rt.cpan.org> or by
email (to sfandino@yahoo.com), please.
Feedback and comments are also welcome!
The 'sub' and 'parsub' features should be considered experimental
and its API or behavior could be changed in future versions of the
module.
In order to report a bug, write a minimal program that triggers it and place the
following line at the beginning:
$Net::OpenSSH::Parallel::debug = -1;
Then, send me (via RT or email) the debugging output you get when
you run it. Include also the source code of the script, a description of
what is going wrong and the details of your OS and the versions of Perl,
"Net::OpenSSH" and
"Net::OpenSSH::Parallel" you are
using.
The source code for this module is hosted at GitHub:
<http://github.com/salva/p5-Net-OpenSSH-Parallel>.
Commercial support, professional services and custom software development around
this module are available through my current company. Drop me an email with a
rough description of your requirements and we will get back to you ASAP.
If you like this module and you are feeling generous, take a look at my Amazon
Wish List: <http://amzn.com/w/1WU1P6IR5QZ42>
Also consider contributing to the OpenSSH project this module
builds upon: <http://www.openssh.org/donations.html>.
Net::OpenSSH is used to manage the SSH connections to the remote hosts.
SSH::Batch has a similar focus as this module. In my opinion it is
simpler to use but rather more limited.
GRID::Machine allows to run perl code distributed in a cluster via
SSH.
If your application requires orchestrating work-flows more complex
than those supported by Net::OpenSSH::Parallel, you should probably consider
some POE or AnyEvent based solution (check POE::Component::OpenSSH).
App::MrShell is another module allowing to run the same command in
several host in parallel.
Some people find easier to use Net::OpenSSH combined with
Parallel::ForkManager, threads or Coro.
Net::SSH::Mechanize is another framework written on top of
AnyEvent that allows to run remote commands through SSH in parallel.
Copyright © 2009-2012, 2015 by Salvador Fandiño
(sfandino@yahoo.com).
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself, either Perl version 5.10.0
or, at your option, any later version of Perl 5 you may have available.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |