HTTPD::Log::Filter - a module to filter entries out of an httpd log.
my $hlf = HTTPD::Log::Filter->new(
exclusions_file => $exclusions_file,
agent_re => '.*Mozilla.*',
format => 'ELF',
);
while( <> )
{
my $ret = $hlf->filter( $_ );
die "Error at line $.: invalid log format\n" unless defined $ret;
print $_ if $ret;
}
print grep { $hlf->filter( $_ ) } <>;
$hlf = HTTPD::Log::Filter->new(
capture => [ qw(
host
ident
authexclude
date
request
status
bytes
) ];
);
while( <> )
{
next unless $hlf->filter( $_ );
print $hlf->host, "\n";
}
print grep { $hlf->filter( $_ ) } <>;
This module provide a simple interface to filter entries out of an httpd
logfile. The constructor can be passed regular expressions to match against
particular fields on the logfile. It does its filtering line by line, using a
filter method that takes a line of a logfile as input, and returns true if it
matches, and false if it doesn't.
There are two possible non-matching (false) conditions; one is
where the line is a valid httpd logfile entry, but just doesn't happen to
match the filter (where "" is returned). The other is where it is
an invalid entry according to the format specified in the constructor.
The constructor is passed a number of options as a hash. These are:
- exclusions_file
- This option can be used to specify a filename for entries that don't match
the filter to be written to.
- invert
- This option, is set to true, will invert the logic of the fliter; i.e.
will return only non-matching lines.
- format
- This should be one of:
- CLF
- Common Log Format (CLF):
"%h %l %u
%t \"%r\" %>s
%b"
- ELF
- NCSA Extended/combined Log format:
"%h %l %u
%t \"%r\" %>s
%b \"%{Referer}i\"
\"%{User-agent}i\""
- XLF
- Some bespoke format based on extended log format + some junk at the end:
"%h %l %u
%t \"%r\" %>s
%b \"%{Referer}i\"
\"%{User-agent}i\"" %j
where %j is .* in regex-speak.
See
<http://httpd.apache.org/docs/mod/mod_log_config.html> for more
information on log file formats.
- SQUID
- Logging format for Squid (v1.1+) caching / proxy servers. This is of the
form:
"%9d.%03d %6d
%s %s/%03d
%d %s
%s %s
%s%s/%s %s"
where the fields are:
time
elapsed
remotehost
code_status
bytes
method
url
rfc931
peerstatus_peerhost
type
(see <http://www.squid-cache.org/Doc/FAQ/FAQ-6.html> for
more info).
- (host|ident|authexclude|date|request|status|bytes|referer|agent)_re
- This class of options specifies the regular expression or expressions
which are used to filter the logfile for httpd logs.
- (time|elapsed|remotehost|code_status|method|url|rfc931|peerstatus_peerhost|type)_re
- Ditto for Squid logs.
- capture [ <fieldname1>, <fieldname2>, ... ]
- This option requests the filter to capture the contents of given named
fields so that they can be examined if the filtering is successful. This
is done by simply putting capturing parentheses around the appropriate
segment of the filtering regex. Fields to be captured are passed as an
array reference. WARNING; do not try to insert your own capturing
parentheses in the custom field regexes, as this will have unpredictable
results when combined with the capture option.
Captured fields can be accessed after each call to filter
using a method call with the same name as the captured field; e.g.
my $filter = HTTPD::Logs::Filter->new(
capture => [ 'host', 'request' ]
);
while ( <> )
{
next unless $filter->filter( $_ );
print $filter->host, " requested ", $filter->request, "\n";
}
Filters a line of a httpd logfile. returns true (the line) if it matches, and
false ("" or undef) if it doesn't.
There are two possible non-matching (false) conditions; one is
where the line is a valid httpd logfile entry, but just doesn't happen to
match the filter (where "" is returned). The other is where it is
an invalid entry according to the format specified in the constructor.
Returns the current filter regular expression.
Returns the current format.
If the capture option has been specified, these methods return the captured
string for each field as a result of the previous call to filter.
Ave Wrigley <Ave.Wrigley@itn.co.uk>
Copyright (c) 2001 Ave Wrigley. All rights reserved. This program is free
software; you can redistribute it and/or modify it under the same terms as
Perl itself.