|
|
| |
CHECKLINK(1) |
User Contributed Perl Documentation |
CHECKLINK(1) |
checklink - check the validity of links in an HTML or XHTML document
checklink [ options ] uri ...
This manual page documents briefly the checklink command, a.k.a. the
W3C® Link Checker.
checklink is a program that reads an HTML or XHTML
document, extracts a list of anchors and lists and checks that no anchor is
defined twice and that all the links are dereferenceable, including the
fragments. It warns about HTTP redirects, including directory redirects, and
can check recursively a part of a web site.
The program can be used either as a command line tool or as a CGI
script.
This program follow the usual GNU command line syntax, with long options
starting with two dashes (`-'). A summary of options is included below.
- -?, -h, --help
- Show summary of options.
- -V, --version
- Output version information.
- -s, --summary
- Show result summary only.
- -b, --broken
- Show only the broken links, not the redirects.
- -e, --directory
- Hide directory redirects - e.g. <http://www.w3.org/TR> ->
<http://www.w3.org/TR/>.
- -r, --recursive
- Check the documents linked from the first one.
- -D, --depth n
- Check the documents linked from the first one to depth n (implies
--recursive).
- -l, --location uri
- Scope of the documents checked (implies --recursive). Can be
specified multiple times in order to specify multiple recursion bases. If
the URI of a candidate document is downwards relative to any of the bases,
it is considered to be within the scope. If not specified, the default is
the base URI of the initial document, for example for
<http://www.w3.org/TR/html4/Overview.html> it would be
<http://www.w3.org/TR/html4/>.
- -X, --exclude regexp
- Do not check links whose full, canonical URIs match regexp. Note
that this option limits recursion the same way as --exclude-docs
with the same regular expression would.
- --exclude-docs regexp
- In recursive mode, do not check links in documents whose full, canonical
URIs match regexp. This option may be specified multiple
times.
- --suppress-redirect URI->URI
- Do not report a redirect from the first to the second URI. The
"->" is literal text. This option may be specified multiple
times. Whitespace may be used instead of "->" to separate the
URIs.
- --suppress-redirect-prefix URI->URI
- Do not report a redirect from a child of the first URI to the same child
of the second URI. The \"->\" is literal text. This option
may be specified multiple times. Whitespace may be used instead of
"->" to separate the URIs.
- --suppress-temp-redirects
- Do not report warnings about temporary redirects.
- --suppress-broken CODE:URI
- Do not report a broken link with the given CODE. CODE is the HTTP
response, or -1 for robots exclusion. The ":" is literal text.
This option may be specified multiple times. Whitespace may be used
instead of ":" to separate the CODE and the URI.
- --suppress-fragment URI
- Do not report the given broken fragment URI. A fragment URI contains
"#". This option may be specified multiple times.
- -L, --languages accept-language
- The "Accept-Language" HTTP header to
send. In command line mode, this header is not sent by default. The
special value "auto" causes a value to
be detected from the "LANG" environment
variable, and sent if found. In CGI mode, the default is to send the value
received from the client as is.
- -c, --cookies cookie-file
- Use cookies, load/save them in cookie-file. The special value
"tmp" causes non-persistent use of
cookies, i.e. they are used but only stored in memory for the duration of
this link checker run.
- -R, --no-referer
- Do not send the "Referer" HTTP
header.
- -q, --quiet
- No output if no errors are found. Implies --summary.
- -v, --verbose
- Verbose mode.
- -i, --indicator
- Show progress while parsing as percentage of lines processed. No indicator
is shown for documents containing no linefeeds.
- -u, --user username
- Specify a username for authentication.
- -p, --password password
- Specify a password for authentication.
- --hide-same-realm
- Hide 401's that are in the same realm as the document checked.
- -S, --sleep secs
- Sleep the specified number of seconds between requests to each server.
Defaults to 1 second, which is also the minimum allowed.
- -t, --timeout secs
- Timeout for requests, in seconds. The default is 30.
- -C, --connection-cache number
- Maximum number of cached connections. Using this option overrides the
"Connection_Cache_Size" configuration
file parameter, see its documentation below for the default value and more
information.
- -d, --domain domain
- Perl regular expression describing the domain to which the authentication
information (if present) will be sent. The default value can be specified
in the configuration file. See the
"Trusted" entry in the configuration
file description below for more information.
- --masquerade "real-prefix surrogate-prefix"
- Perform a simple string substitution: URIs which begin with the string
"real-prefix" are rewritten using the
"surrogate-prefix" before being
dereferenced. Useful for making a local directory masquerade as a remote
one. For example:
--masquerade "http://example.com/x/y/z/ file:///my/local/dir/"
If the document being checked contains a link to
http://example.com/x/y/z/foo.html, then the local file system will be
checked for file:///my/local/dir/foo.html.
--masquerade takes a single argument consisting of two
URIs, separated by whitespace. The quote marks are not part of the
argument, but one usual way of providing a value with embedded
whitespace is to enclose it in quotes.
- -H, --html
- HTML output.
- /etc/w3c/checklink.conf
- The main configuration file. You can use the W3C_CHECKLINK_CFG environment
variable to override the default location.
"Trusted" specifies a
regular expression for matching trusted domains (ie. domains where HTTP
basic authentication, if any, will be sent). The regular expression will
be matched case insensitively against host names. The default behavior
(when unset, that is) is to send the authentication information only to
the host which requests it; usually you don't want to change this. For
example, the following configures only the w3.org domain as
trusted:
Trusted = \.w3\.org$
"Allow_Private_IPs" is a
boolean flag indicating whether checking links on non-public IP
addresses is allowed. The default is true in command line mode and false
when run as a CGI script. For example, to disallow checking non-public
IP addresses, regardless of the mode, use:
Allow_Private_IPs = 0
"Forbidden_Protocols" is a
comma separated list of additional protocols/URI schemes that the link
checker is not allowed to use. The
"javascript" and
"mailto" schemes are always forbidden,
and so is the "file" scheme when
running as a CGI script.
Forbidden_Protocols = javascript,mailto
"Markup_Validator_URI" and
"CSS_Validator_URI" are formatted URIs
to the respective validators. The %s in these
will be replaced with the full "URI encoded" URI to the
document being checked, and shown in the link checker results view in
the online/CGI version. The defaults are:
Markup_Validator_URI =
http://validator.w3.org/check?uri=%s
CSS_Validator_URI =
http://jigsaw.w3.org/css-validator/validator?uri=%s
"Doc_URI" is a URI used for
linking to the documentation, and CSS and JavaScript files in the
dynamically generated content of the link checker. The default is:
Doc_URI = http://validator.w3.org/docs/checklink.html
"Connection_Cache_Size" is
an integer denoting the maximum number of connections the link checker
will keep open at any given time. The default is:
Connection_Cache_Size = 2
checklink uses the libwww-perl library which has a number of environment
variables affecting its behaviour. See "SEE ALSO" for some pointers.
- W3C_CHECKLINK_CFG
- If set, overrides the path to the configuration file.
The documentation for this program is available on the web at
<http://validator.w3.org/docs/checklink.html>.
LWP, Net::FTP, Net::NNTP, Net::IP, perlre.
This program was originally written by Hugo Haas <hugo@w3.org>, based on
Renaud Bruyeron's checklink.pl. It has been enhanced by Ville
Skyttä and many other volunteers since. Use the
<www-validator@w3.org> mailing list for feedback, and see
<http://validator.w3.org/docs/checklink.html#csb> for more information.
This manual page was originally written by Frédéric
Schütz <schutz@mathgen.ch> for the Debian GNU/Linux system (but
may be used by others).
This program is licensed under the W3C® Software License,
<http://www.w3.org/Consortium/Legal/copyright-software>.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |