W3C::LogValidator - The W3C Log Validator - Quality-focused Web Server log
processing engine
Checks quality/validity of most popular content on a Web
server
"W3C::LogValidator" is the main module for the
W3C Log Validator, a combination of Web Server log analysis and statistics
tool and Web Content quality checker.
The "W3C::LogValidator" can
batch-process a number of documents through a number of quality focus
checks, such as HTML or CSS validation, or checking for broken links. It can
take a number of different inputs, ranging from a simple list of URIs to log
files from various Web servers. And since it orders the result depending on
the number of times a document appears in the file or logs, it is, in
practice, a useful way to spot the most popular documents that need
work.
the perl script logprocess.pl, bundled in the W3C::LogValidator
distribution, is a simple way to use the features of
"W3C::LogValidator". Developers can also
use "W3C::LogValidator" can be used as a
perl module to build applications.
The homepage for the Log Validator is at:
http://www.w3.org/QA/Tools/LogValidator/
The simple way to use is to edit the sample configuration file
(samples/logprocess.conf) and to run the bundled logprocess.pl script with
this configuration file, a la:
logprocess.pl -f /path/to/logprocess.conf
The basic task of the
"W3C::LogValidator" module is to parse a
configuration file and process relevant logs, passed through a configuration
file argument:
use W3C::LogValidator;
my $logprocessor = W3C::LogValidator->new("sample.conf");
$logprocessor->process;
Alternatively, it will use default a default config and try to
process Web server logs in "well known locations":
my $logprocessor = W3C::LogValidator->new;
$logprocessor->process;
- $processor = W3C::LogValidator->new
- Constructs a new "W3C::LogValidator"
processor. You might pass a configuration file name, as well as a hash of
attribute-value pairs as parameters to the constructor.
e.g. for mail output:
%conf = (
"UseOutputModule" => "W3C::LogValidator::Output::Mail",
"ServerAdmin" => 'webmaster@example.com',
"verbose" => "3"
);
$processor = W3C::LogValidator->new("path/to/config.conf", \%conf);
Or e.g. for HTML output:
%conf = (
"UseOutputModule" => "W3C::LogValidator::Output::HTML",
"OutputTo" => 'path/to/file.html',
"verbose" => "0"
);
$processor = W3C::LogValidator->new("path/to/config.conf", \%conf);
If given the path to a configuration file,
"new()" will call the
W3C::LogValidator::Config module to get its configuration variables.
Otherwise, a default set of values is used.
- $processor->process =item $processor->find_remote_addr
- Given a log record and the type of the log (common log format, flat list
of URIs, etc), extracts the remote host or ip
Do-it-all method: Read configuration file (if any), parse log
files, run them through processing modules, send result to output
module.
- $processor->config_module
- Creates a configuration hash for a specific module, adding module-specific
configuration variables, overriding if necessary
- $processor->use_modules
- Run the data parsed off the log files through the various processing
(validation) modules specified by UseValidationModule in the
configuration.
- $processor->read_logfiles
- Loops through and parses all log files specified in the configuration
- $processor->read_logfile('path/to.file')
- Extracts URIs and number of hits from a given log file, and feeds it to
the processor's URI/Hits table
- $processor->find_uri
- Given a log record and the type of the log (common log format, flat list
of URIs, etc), extracts the URI
- $processor->remove_duplicates
- Given a URI, removes "directory index" suffixes such as
index.html, etc so that http://foobar/ and http://foobar/index.html be
counted as one resource
- $processor->add_uri
- Add a URI to the processor's URI/Hits table
- $processor->sorted_uris
- Returns the list of URIs in the processor's table, sorted by popularity
(hits)
- $processor->no_cgi
- Tests whether a given URI contains a CGI query string
- $processor->hit
- Returns the number of hits for a given URI. Basically a "public"
method accessing $hits{$uri};
Public bug-tracking interface at http://www.w3.org/Bugs/Public/
Olivier Thereaux <ot@w3.org> for The World Wide Web Consortium
Up-to-date information on the Log Validator at:
http://www.w3.org/QA/Tools/LogValidator/
Several articles have been written within the W3C Quality Assurance Interest
Group on the topic of improving the quality of Web sites, notably by using a
step-by-step approach and relying upon the Log Validator to help find the
areas to fix in priority.
- My Web site is standard! And yours?
- Available at http://www.w3.org/QA/2002/04/Web-Quality
- Web Standards Switch
- or how to improve your Web site easily.
Available in several languages at:
http://www.w3.org/QA/2003/03/web-kit
- Making your website valid: a step by step guide.
- Available at http://www.w3.org/QA/2002/09/Step-by-step