|
|
| |
LWP::Parallel::UserAgent(3) |
User Contributed Perl Documentation |
LWP::Parallel::UserAgent(3) |
LWP::Parallel::UserAgent - A class for parallel User Agents
require LWP::Parallel::UserAgent;
$ua = LWP::Parallel::UserAgent->new();
...
$ua->redirect (0); # prevents automatic following of redirects
$ua->max_hosts(5); # sets maximum number of locations accessed in parallel
$ua->max_req (5); # sets maximum number of parallel requests per host
...
$ua->register ($request); # or
$ua->register ($request, '/tmp/sss'); # or
$ua->register ($request, \&callback, 4096);
...
$ua->wait ( $timeout );
...
sub callback { my($data, $response, $protocol) = @_; .... }
This class implements a user agent that access web sources in parallel.
Using a LWP::Parallel::UserAgent as your user agent, you
typically start by registering your requests, along with how you want the
Agent to process the incoming results (see
$ua->register).
Then you wait for the results by calling
$ua->wait. This method only returns, if all
requests have returned an answer, or the Agent timed out. Also, individual
callback functions might indicate that the Agent should stop waiting for
requests and return. (see $ua->register)
See the file LWP::Parallel for a set of simple examples.
The LWP::Parallel::UserAgent is a sub-class of LWP::UserAgent, but not all of
its methods are available here. However, you can use its main methods,
$ua->simple_request and
$ua->request, in order to simulate singular access
with this package. Of course, if a single request is all you need, then you
should probably use LWP::UserAgent in the first place, since it will be faster
than our emulation here.
For parallel access, you will need to use the new methods that
come with LWP::Parallel::UserAgent, called
$pua->register and
$pua->wait. See below for more information on
each method.
- $ua = LWP::Parallel::UserAgent->new();
- Constructor for the parallel UserAgent. Returns a reference to a
LWP::Parallel::UserAgent object.
Optionally, you can give it an existing
LWP::Parallel::UserAgent (or even an LWP::UserAgent) as a first
argument, and it will "clone" a new one from this (This just
copies the behavior of LWP::UserAgent. I have never actually tried this,
so let me know if this does not do what you want).
- $ua->initialize;
- Takes no arguments and initializes the UserAgent. It is automatically
called in LWP::Parallel::UserAgent::new, so usually there is no need to
call this explicitly.
However, if you want to re-use the same UserAgent object for a
number of "runs", you should call
$ua->initialize after you have processed the
results of the previous call to $ua->wait,
but before registering any new requests.
- $ua->redirect ( $ok )
- Changes the default value for permitting Parallel::UserAgent to follow
redirects and authentication-requests. The standard value is 'true'.
See "$ua-"register> for
how to change the behaviour for particular requests only.
- $ua->nonblock ( $ok )
- Per default, LWP::Parallel will connect to a site using a blocking call.
If you want to speed this step up, you can try the new non-blocking
version of the connect call by setting
$ua->nonblock to 'true'. The standard value is
'false' (although this might change in the future if nonblocking connects
turn out to be stable enough.)
- $ua->duplicates ( $ok )
- Changes the default value for permitting Parallel::UserAgent to ignore
duplicate requests. The standard value is 'false'.
- $ua->in_order ( $ok )
- Changes the default value to restricting Parallel::UserAgent to connect to
the registered sites in the order they were registered. The default value
FALSE allows Parallel::UserAgent to make the connections in an apparently
random order.
- $ua->remember_failures ( $yes )
- If set to one, enables ParalleUA to ignore requests or connections to
sites that it failed to connect to before during this "run". If
set to zero (the dafault) Parallel::UserAgent will try to connect to every
single URL you registered, even if it constantly fails to connect to a
particular site.
- $ua->max_hosts ( $max )
- Changes the maximum number of locations accessed in parallel. The default
value is 7.
Note: Although it says 'host', it really means
'netloc/server'! That is, multiple server on the same host (i.e. one
server running on port 80, the other one on port 6060) will count as two
'hosts'.
- $ua->max_req ( $max )
- Changes the maximum number of requests issued per host in parallel. The
default value is 5.
- $ua->register ( $request [, $arg [, $size [, $redirect_ok]]] )
- Registers the given request with the User Agent. In case of an error, a
"HTTP::Request" object containing the
HTML-Error message is returned. Otherwise (that is, in case of a success)
it will return undef.
The $request should be a reference to
a "HTTP::Request" object with values
defined for at least the method() and url()
attributes.
$size specifies the number of bytes
Parallel::UserAgent should try to read each time some new data arrives.
Setting it to '0' or 'undef' will make Parallel::UserAgent use the
default. (8k)
Specifying $redirect_ok will alter the
redirection behaviour for this particular request only. '1' or any other
true value will force Parallel::UserAgent to follow redirects, even if
the default is set to 'no_redirect'. (see
"$ua-"redirect>) '0' or any other
false value should do the reverse. See LWP::UserAgent for using an
object's "requests_redirectable" list
for fine-tuning this behavior.
If $arg is a scalar it is taken as a
filename where the content of the response is stored.
If $arg is a reference to a
subroutine, then this routine is called as chunks of the content is
received. An optional $size argument is taken as
a hint for an appropriate chunk size. The callback function is called
with 3 arguments: the data received this time, a reference to the
response object and a reference to the protocol object. The callback can
use the predefined constants C_ENDCON, C_LASTCON and C_ENDALL as a
return value in order to influence pending and active connections.
C_ENDCON will end this connection immediately, whereas C_LASTCON will
inidicate that no further connections should be made. C_ENDALL will
immediately end all requests and let the Parallel::UserAgent return from
$pua->wait().
If $arg is omitted, then the content
is stored in the response object itself.
If $arg is a
"LWP::Parallel::UserAgent::Entry"
object, then this request will be registered as a follow-up request to
this particular entry. This will not create a new entry, but instead
link the current response (i.e. the reason for re-registering) as
$response->previous to the new response of
this request. All other fields are either re-initialized ($request,
$fullpath, $proxy) or
left untouched ($arg, $size). (This should only
be use internally)
LWP::Parallel::UserAgent->request also allows the
registration of follow-up requests to existing requests, that required
redirection or authentication. In order to do this, an
Parallel::UserAgent::Entry object will be passed as the second argument
to the call. Usually, this should not be used directly, but left to the
internal $ua->handle_response method!
- $ua->on_connect ( $request, $response, $entry )
- This method should be overridden in an (otherwise empty) subclass in order
to present customized messages for each connection attempted by the User
Agent.
- $ua->on_failure ( $request, $response, $entry )
- This method should be overridden in an (otherwise empty) subclass in order
to present customized messages for each connection or registration that
failed.
- $ua->on_return ( $request, $response, $entry )
- This method should be overridden in an (otherwise empty) subclass in order
to present customized messages for each request returned. If a callback
function was registered with this request, this callback function is
called before $pua->on_return.
Please note that while
$pua->on_return is a method (which should be
overridden in a subclass), a callback function is NOT a method, and does
not have $self as its first parameter. (See more
on callbacks below)
The purpose of $pua->on_return is
mainly to provide messages when a request returns. However, you can also
re-register follow-up requests in case you need them.
If you need specialized follow-up requests depending on the
request that just returend, use a callback function instead (which can
be different for each request registered). Otherwise you might end up
writing a HUGE if..elsif..else.. branch in this global method.
- $us->discard_entry ( $entry )
- Completely removes an entry from memory, in case its output is not needed.
Use this in callbacks such as
"on_return" or <on_failure> if you
want to make sure an entry that you do not need does not occupy valuable
main memory.
- $ua->wait ( $timeout )
- Waits for available sockets to write to or read from. Will timeout after
$timeout seconds. Will block if
$timeout = 0 specified. If
$timeout is omitted, it will use the Agent default
timeout value.
- $ua->handle_response($request, $arg [, $size])
- Analyses results, handling redirects and security. This method may
actually register several different, additional requests.
This method should not be called directly. Instead, indicate
for each individual request registered with
"$ua-"register()> whether or
not you want Parallel::UserAgent to handle redirects and security, or
specify a default value for all requests in Parallel::UserAgent by using
"$ua-"redirect()>.
- DEPRECATED $ua->deprecated_simple_request($request, [$arg [,
$size]])
- This method simulated the behavior of LWP::UserAgent->simple_request.
It was actually kinda overkill to use this method in Parallel::UserAgent,
and it was mainly here for testing backward compatibility with the
original LWP::UserAgent.
The name has been changed to deprecated_simple_request in case
you need it, but because it it no longer compatible with the most recent
version of libwww, it will no longer run by default.
The following description is taken directly from the
corresponding libwww pod:
$ua->simple_request dispatches a
single WWW request on behalf of a user, and returns the response
received. The $request should be a reference to
a "HTTP::Request" object with values
defined for at least the method() and url()
attributes.
If $arg is a scalar it is taken as a
filename where the content of the response is stored.
If $arg is a reference to a
subroutine, then this routine is called as chunks of the content is
received. An optional $size argument is taken as
a hint for an appropriate chunk size.
If $arg is omitted, then the content
is stored in the response object itself.
- DEPRECATED $ua->deprecated_request($request, $arg [, $size])
- Previously called 'request' and included for compatibility testing with
LWP::UserAgent. Every day usage was deprecated, and now you have to call
it with the deprecated_request name if you want to use it (because an
incompatibility was introduced with the newer versions of libwww).
Here is what LWP::UserAgent has to say about it:
Process a request, including redirects and security. This
method may actually send several different simple reqeusts.
The arguments are the same as for
"simple_request()".
- $ua->as_string
- Returns a text that describe the state of the UA. Should be useful for
debugging, if it would print out anything important. But it does not (at
least not yet). Try using LWP::Debug...
- $ua->use_alarm([$boolean])
- This function is not in use anymore and will display a warning when called
and warnings are enabled.
You can register a callback function. See LWP::UserAgent for details.
Probably lots! This was meant only as an interim release until this
functionality is incorporated into LWPng, the next generation libwww module
(though it has been this way for over 2 years now!)
Needs a lot more documentation on how callbacks work!
Copyright 1997-2004 Marc Langheinrich <marclang@cpan.org>
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
Hey! The above document had some coding errors, which are explained
below:
- Around line 1532:
- You forgot a '=back' before '=head1'
- Around line 1534:
- '=item' outside of any '=over'
- Around line 1545:
- You forgot a '=back' before '=head1'
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |