GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Sequin(3) User Contributed Perl Documentation Sequin(3)

URI::Sequin - Extract information from the URLs of Search-Engines

        use URI::Sequin qw/se_extract key_extract log_extract %log_types/;

        $url = &log_extract($line_from_log_file, 'NCSA');

        $log_types{'MyLogType'} = '^(.+?) -> .+$';
        $url = &log_extract($line_from_log_file, 'MyLogType');

        $keyword_string = &key_extract($url);

        ($search_engine_name, $search_engine_url) = @{&se_extract($url)};

This module provides three tools to aid people trying to analyse Search-Engine URLs. It’s meant mainly for those who want to analyse referrer logs and pick out key information about site visitors, such as which Search-Engine and keywords they used to find the site.

The functions and globals provided (and exported by default) from this module are:

log_extract($log_line, 'Type')
This will pick out the referring URL from a line of a logfile. The 'type' can be one of the built in types or can be a user-created one. For more information, see %log_types below. This subroutine accepts a scalar, and returns a scalar.
key_extract($url)
This will try and determine the keywords used in $url. It accepts a scalar and returns a scalar. Should nothing be found, it returns an undefined value.
se_extract($url)
This will try and determine the name of the Search-Engine used and its URL. It accepts a scalar, and returns an array containing firstly the Search- Engine’s name and secondly the Search-Engine’s URL. Should the URL appear not to be from a Search Query, it returns a reference to an empty array.
%log_types
There are five built-in logfile types already in this hash. They are:
  • IIS1 - Microsoft IIS 3.0 and 2.0
  • IIS2 - Microsoft IIS4.0 (W3SVC format)
  • NCSA - For APACHE, NETSCAPE and any other NCSA format logs
  • ORW - O'Reilly WebSite format
  • General - A generalised one that will work with most logfiles

It’s easy to add another one. Simply add a key to the hash, with a value that is a regex. Parenthesise the part that is the referring URL, as the script uses $1 to obtain the URL. (see the example in the Synopsis section).

I have only one request for people who use this module. *Please* tell me where and how you've used it, and if you have any thoughts or suggestions on it, tell me!

Doesn't like the Amnesi Search Engine. But then, neither do I. Also, the 'General' log type needs to be used with discretion ... be sure that none of the URLs contain literal " if you use it.

Peter Sergeant <pete@grou.ch>

Copyright 2001 Peter Sergeant.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Hey! The above document had some coding errors, which are explained below:
Around line 419:
Non-ASCII character seen before =encoding in 'It’s'. Assuming CP1252
2003-09-01 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.