|
|
| |
WWW::Scripter(3) |
User Contributed Perl Documentation |
WWW::Scripter(3) |
WWW::Scripter - For scripting web sites that have scripts
use WWW::Scripter;
$w = new WWW::Scripter;
$w->use_plugin('Ajax'); # packaged separately
$w->get('http://some.site.com/that/relies/on/ajax');
$w->eval(' alert("Hello from JavaScript") ');
$w->document->getElementsByTagName('div')->[0]->....
$w->content; # returns the HTML content, possibly modified
# by scripts
This is a subclass of WWW::Mechanize that uses the W3C DOM and provides support
for scripting.
No actual scripting engines are provided with
WWW::Scripter, but are available as separate plugins. (See also the
"SEE ALSO" section below.)
There are two basic modes in which you can use WWW::Scripter:
If you only need a single virtual window (which is usually the
case), use WWW::Scripter itself, as described below and in
WWW::Mechanize.
For multiple windows, start with a window group (see
WWW::Scripter::WindowGroup) and fetch the WWW::Scripter object via its
"active_window" method before
proceeding.
At any time you can attach an existing window (WWW::Scripter
object) to a window group using the latter's
"attach" method. You can also
"->close" a window to detach it from
its window group and put it back in single-window mode.
These two modes affect the behaviour of a few methods
("open",
"close",
"blur",
"focus") and hyperlinks and forms with
explicit targets.
See WWW::Mechanize for a vast list of methods that this module inherits. (See
also the "Notes About WWW::Mechanize Methods", below.)
In addition to those, this module implements the well-known Window
interface, providing also a few routines for attaching scripting engines and
what-not.
In the descriptions below, $w refers to
the WWW::Scripter object. You can think of it as short for either
'WWW::Scripter' or 'window'.
my $w = new WWW::Scripter %args
The constructor accepts named arguments. There are only two that
WWW::Scripter itself deals with directly. The rest are passed on to the
superclass. See WWW::Mechanize and LWP::UserAgent for details on what other
arguments the constructor accepts.
The two arguments are:
- max_docs
- The maximum number of document objects to keep in history (along with
their corresponding request and response objects). If this is omitted,
Mech's "stack_depth" + 1 will be used.
This is off by one because "stack_depth"
is the number of pages you can go back to, so it is one less than the
number of recorded pages. "max_docs"
considers 0 to be equivalent to infinity.
- max_history
- If the number of items in history exceeds
"max_docs", WWW::Scripter will still
keep the request objects (so you can go back more than
"max_docs" times and previously visited
pages will reload). "max_history"
restricts the total number of items in history (whether full document
objects or just requests). 0 is equivalent to infinity.
In addition to the methods listed here, see also HTML::DOM::View and
HTML::DOM::EventTarget.
- location
- Returns the location object (see WWW::Scripter::Location). If you pass an
argument, it sets the "href" attribute
of the location object.
- alert
- confirm
- prompt
- Each of these calls the function assigned by one of the
"set_*" methods below under
"Window-Related Methods".
- navigator
- Returns the navigator object. See WWW::Scripter::Navigator.
- screen
- Returns the screen object. It currently has no features.
- setTimeout ( $code_string, $ms );
- setTimeout ( $coderef, $ms, @args );
- This schedules the code to run after $ms
milliseconds have elapsed, returning a number uniquely identifying the
time-out. If the first argument is a coderef or an object with
"&{}" overloading, it will be called
as such. Otherwise, it is parsed as a string of JavaScript code. (If the
JavaScript plugin is not loaded, it will be ignored.)
- setInterval ( $code_string, $ms );
- setInterval ( $coderef, $ms, @args );
- This method is just like "setTimeout",
except that, when the code runs, it schedules it to run again after
$ms milliseconds.
- clearTimeout ( $timeout_id )
- The cancels the time-out corresponding to the
$timeout_id. This only works for those registered
with "setTimeout".
- clearInterval ( $timer_id )
- The cancels the timer corresponding to the
$timer_id. This only works for those registered
with "setInterval".
- open ( $url, $target, $features, $replace )
- If $target is not specified or if there is no
window or frame named $target, this methods opens
the $url in a new window in multiple-window mode,
or at the top-level window in single-window mode.
If there is a window or frame named
$target, then the $url
is opened in that window. If $replace is true,
it replaces the current page.
A relative $url is resolved according
to the base URL of the current window (the one that
"open" is called on), not the
$target.
The $features argument is ignored.
- close
- In multiple-window mode, this detaches this window from its window group.
In single-window mode (when there is no window group) it goes back to the
previous entry in history (so that it is the opposite of
"open").
- focus
- In multiple-window mode, this brings this window to the front. In
single-window mode (when there is no window group) it does nothing.
- blur
- In multiple-window mode, this sends this window back one, if it is the
frontmost window. In single-window mode (when there is no window group) it
does nothing.
- history
- Returns the history object. See WWW::Scripter::History.
- window
- self
- These two return the window object itself.
- frames
- Although the W3C DOM specifies that this return $w
(the window itself), for efficiency's sake this returns a separate object
which one can use as a hash or array reference to access its sub-frames.
(The window object itself cannot be used that way.) The frames object
(class WWW::Scripter::Frames) also has a
"window" method that returns
$w.
In list context a list of frames is returned.
- length
- Returns the number of frames.
"$w->length" is equivalent to
"scalar @{$w->frames}".
- top
- Returns the 'top' window, which is the window itself if there are no
frames.
- parent
- Returns the parent frame, if there is one, or the window object itself
otherwise.
- name
- This returns the window's name, if applicable. For a frame, this comes
from the frame element to which the window belongs. For a top-level window
created by "open", this is the name that
was passed as the second argument.
- scroll
- scrollTo
- scrollBy
- These exist in case scripts try to call them. They don't do anything.
These methods are not part of the Window interface, but are closely related to
the object's window behaviour.
- set_alert_function
- set_confirm_function
- set_prompt_function
- Use these to set the functions called by the above methods. There are no
default "confirm" and
"prompt" functions. The default
"alert" prints to the currently selected
file handle, with a line break tacked on the end.
- check_timers
- This evaluates the code associated with each timeout registered with the
"setTimeout" method, if the appropriate
interval has elapsed.
- count_timers
- This returns the number of timers currently registered.
- wait_for_timers ( %args )
- This method waits for any registered timers to finish (calling
"check_timers" repeatedly in a loop).
Its %args are as follows:
max_wait Number indicating for how many seconds the loop
should run before giving up and returning.
min_timers Only run until this many timers are left, not until
they have all finished.
interval Number of seconds to wait before each iteration of
the loop. The default is .1.
Some websites have timers running constantly, that are never
cleared. For these, you will usually need to set a value for
"min_timers" (or
"max_wait") to avoid an infinite
loop.
- window_group
- This returns the window group that owns this window. See "SINGLE VS
MULTIPLE WINDOWS", above.
You can also pass an argument to set it, but you should only
do so if you know what you are doing, as it does not update the window
group's list. Consider using WWW::Scripter::WindowGroup's
"attach" method (which itself uses
this method).
- find_target ( $name )
- This finds the WWW::Scripter object (window or frame) in which a link will
be opened.
If $name is not an empty string, it
returns the window corresponding to $name.
If $name is the empty string or
undefined, it returns the default target for this window, based on the
first "<base target>"
element.
If a named window cannot be found: in multiple-window mode, a
new window is opened and returned; in single-window mode,
"undef" is returned.
- fetch_images ( $new_val )
- A boolean indicating whether images should be fetched. Some sites use
images with special URLs as cookies and refuse to work if those images are
not fetched. Most of the time, however, you probably want to leave this
off, for speed's sake.
Setting this does not affect any pages that are already
loaded.
- image_handler ( $coderef )
- A subroutine for handling any images that are fetched. The subroutine will
be passed three arguments: 0) the WWW::Scripter object, 1) the image or
input element and 2) the response object.
- eval ( $code [, $scripting_language] )
- Evaluates the $code passed to it. This method dies
if there is no script handler registered for the
$scripting_language.
- use_plugin ( $plugin_name [, @options] )
- This will automatically "require()" the
plugin for you, and then initialise it. To pass extra options to the
plugin after loading it, just use the same syntax again. This will return
the plugin object if the plugin has one.
- plugin ( $plugin_name )
- This will return the plugin object, if it has one. Some plugins may
provide this as a way to communicate directly with the plugin.
You can also use the return value as a boolean, to see whether
a plugin is loaded.
- dom_enabled ( $new_val )
- This returns a boolean indicating whether HTML pages are parsed and turned
into a DOM tree. It is true by default. You can disable HTML parsing by
passing a false value. Of course, if you are using WWW::Scripter to begin
with, you won't want to turn this off will you? Nevertheless, this is
useful for fetching files behind the scenes when just the file contents
are needed.
- scripts_enabled ( $new_val )
- This returns a boolean indicating whether scripts are enabled. It is true
by default. You can disable scripts by passing a false value. When you
disable scripts, event handlers are also disabled, as is the registration
of event handlers by HTML event attributes.
- script_handler ( $language_re, $object )
- A script handler is a special object that knows how to run scripts in a
particular language. Use this method to register such an object.
$language_re is a regular expression
that will be matched against a scripting language name (from a
'language' HTML attribute) or MIME type (<script type=...). You can
also use the special value 'default'.
$object is the script handler object.
For its interface, see "SCRIPT HANDLERS", below.
- class_info ( \%interfaces )
- With this you can provide information for binding Perl classes to
scripting languages, so that scripts can handle objects of those classes.
You should pass a hash ref that has the structure described in
HTML::DOM::Interface, except that this method also accepts a
"_constructor" hash element, which
should be set to the name of the method to be called when the
constructor function is called from the scripting language (e.g.,
"_constructor => 'new'") or a
subroutine reference.
The return value is a list of all hashrefs passed to
"class_info" so far plus a few that
WWW::Scripter has by default (to support the DOM). You can call it
without any arguments just to get that list.
- forward
- The equivalent of hitting the 'forward' button in a browser. This, of
course, only works after "back".
- clear_history ( $including_current_page )
- This clears the history, preventing
"back" from working until after the next
request, and freeing up some memory. If supplied with a true argument, it
also clears the current page. It returns $w.
- max_history
- max_history ( $new_value )
- max_docs
- max_docs ( $new_value )
- These two return what was passed to the constructor, optionally setting
it.
- links
- WWW::Scripter overrides the
"_extract_links" method that
"links",
"find_link" and
"follow_link" use behind the scenes, to
make it use the HTML DOM tree instead of the source code of the page.
This overridden method tries hard to match WWW::Mechanize as
closely as possible, which means it includes link tags, (i)frames, and
meta tags with http-equiv set to 'refresh'.
This is significantly different from
"$w->document->links", an
HTML::DOM method that follows the W3C DOM spec and returns only 'a' and
'area' elements.
To trigger events (and event handlers), use the
"trigger_event" method of the object on
which you want to trigger it. For instance:
$w->trigger_event('resize'); # runs onresize handlers
$w->document->links->[0]->trigger_event('mouseover');
$w->current_form->trigger_event('submit'); # same as $w->submit
"trigger_event" accepts more
arguments. See HTML::DOM and HTML::DOM::EventTarget for details.
WWW::Scripter does not implement any event loop, so you have to call
"check_timers" or
"wait_for_timers" yourself to trigger any
timeouts. If you set up a loop like this,
sleep 1, $w->check_timers while $w->count_timers;
or if you use "wait_for_timers",
beware that these may cause an infinite loop if a timeout sets another
timeout, or if a timer is registered with
"setInterval". You basically have to know
what works with the pages you are browsing.
The hash named %WWW::Scripter::WindowInterface lists the
interface members for the window object. It follows the same format as hashes
within %HTML::DOM::Interface, like this:
(
alert => VOID|METHOD,
confirm => BOOL|METHOD,
...
)
It only includes those methods listed above under "The Window
Interface".
This section is only of interest to those implementing scripting engines. If you
are not writing one, skip this section (or just read it anyway).
A script handler object must provide the following methods:
- eval ( $w, $code, $url, $line, $is_inline )
- (where $w is the WWW::Scripter object)
This is supposed to run the $code
passed to it. It must set $@ to a true value if
there is an error.
- event2sub ( $w, $elem, $event_name, $code, $url, $line )
- This is called for each HTML event attribute (onclick, etc.). It should
return a coderef that runs the $code.
If it could not create a code ref, it should return
"undef" and put the error message, if
any, in $@.
Plugins are usually under the WWW::Scripter::Plugin:: namespace. If a plugin
name has a hyphen (-) in it, the module name will contain a double colon (::).
If, when you pass a plugin name to
"use_plugin" or
"plugin", it has a double colon in its name,
it will be treated as a fully-qualified module name (possibly) outside the
usual plugin namespace. Here are some examples:
Plugin Name Module Name
----------- -----------
Chef WWW::Scripter::Plugin::Chef
Man-Page WWW::Scripter::Plugin::Man::Page
My::Odd::Plugin My::Odd::Plugin
This module will need to have an
"init" method, and possibly two more named
"options" and
"clone", respectively:
- init
- "init" will be called as a class method
the first time "use_plugin" is called
for a particular plugin. The second argument
($_[1]) will be the WWW::Scripter object. The
third argument will be an array ref of options (see "options",
below).
It may return an object if the plugin has one.
- options
- When "$w->use_plugin" is called, if
there are any arguments after the plugin name, then the plugin object's
"options" method will be called with the
options themselves as the arguments.
If a plugin does not provide an object, an error will be
thrown if options are passed to
"use_plugin".
The "init" method can
override this, however. When it is called, its third argument is
a reference to an array containing the options passed to
"use_plugin". The contents of that
same array will be used when "options"
is called, so "init" can modify it and
even prevent "options" from being
called altogether, by emptying the array.
- clone
- When the WWW::Scripter object is cloned (via the
"clone" method), every plugin that has a
clone method (as determined by
"->can('clone')"), will also be
cloned. The new clone of the WWW::Scripter object is passed as its
argument.
If the plugin needs to record data pertinent to the current page,
it can do so by associating them with the document or the request via a
field hash. See Hash::Util::FieldHash and Hash::Util::FieldHash::Compat.
See LWP's Handlers feature.
From within LWP's "request_*"
and "response_*" handlers, you can call
"WWW::Scripter::abort" to abort the
request and prevent a new entry from being created in browser history. (The
JavaScript plugin does this with javascript: URLs.)
WWW::Scripter will export this function upon request:
use WWW::Scripter qw[ abort ];
or you can call it with a fully qualified name:
WWW::Scripter::abort();
This is still an unfinished work. There are probably scores of bugs crawling all
over the place. Here are some that are known (apart from the fact that so many
features are still missing):
- There is no support for XHTML, but HTML::Parser can handle most XHTML
pages anyway, so maybe this is not a problem.
- There is nothing to prevent infinite recursion when frames have circular
references.
To report a bug, please send an e-mail to
bug-WWW-Scripter@rt.cpan.org <mailto:bug-WWW-Scripter@rt.cpan.org> or
use the web interface at <http://rt.cpan.org/>.
perl 5.8.3 or higher (5.8.4 or higher recommended)
HTML::DOM 0.045 or higher
LWP 5.77 or higher
URI
WWW::Mechanize 1.2 or higher
Tie::RefHash::Weak 0.08 or higher for perl 5.8.x.
Copyright (C) 2009-16, Father Chrysostomos (sprout at, um, cpan dot org)
This program is free software; you may redistribute or modify it
(or both) under the same terms as perl.
Some of the code in here was stolen from the immediate superclass,
WWW::Mechanize, as were some of the tests and test data.
WWW::Scripter sub-modules: ::Location, ::History and ::Navigator.
See WWW::Mechanize, of which this is a subclass.
See also the following plugins:
- WWW::Scripter::Plugin::JavaScript
- WWW::Scripter::Plugin::Ajax
And, if you are curious, have a look at the plugin version of
WWW::Mechanize and WWW::Mechanize::Plugin::DOM (experimental and now
deprecated) that this was originally based on:
http://www-mechanize.googlecode.com/svn/wm/branches/plugins/
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |