HTML::Highlight - A module to highlight words or patterns in HTML documents
use HTML::Highlight;
# create the highlighter object
my $hl = new HTML::Highlight (
words => [
'word',
'any',
'car',
'some phrase'
],
wildcards => [
undef,
'%',
'*',
undef
],
colors => [
'#FF0000',
'red',
'green',
'rgb(255, 0, 0)'
],
czech_language => 0,
debug => 0
);
# Remember that you don't need to specify your own colors.
# The default colors should be optimal.
# Now you can use the object to highlight patterns in a document
# by passing content of the document to its highlight() method.
# The highlighter object "remembers" its configuration.
my $highlighted_document = $hl->highlight($document);
This module was originaly created to work together with fulltext indexing module
DBIx::TextIndex to highlight search results.
A need for a highlighter that takes wildcard matches and HTML tags
into account and supports czech language (or other Slavic languages) was the
motivation to create this module.
This module provides Google-like highlighting of words or patterns in HTML
documents. This feature is typically used to highlight search results.
- The construcutor:
-
my $hl = new HTML::Highlight (
words => [],
wildcards => [],
colors => [],
czech_language => 0,
debug => 0
);
This is a constructor of the highlighter object. It takes an
array of even number of parameters.
The words parameter is a reference to an array of words
to highlight.
The wildcards parameter is a reference to an array of
wildcards, that are applied to corresponding words in the words
array.
A wildcard can be either undef or one of '%' or '*'.
The "%" character means "match any
characters":
"%" applied to 'car' ==> matches "car", "cars", "careful", ...
The "*" character means "match also
plural form of the word":
"*" applied to 'car' ==> matches only "car" or "cars"
An undefined wildcard means "match exactly the
corresponding word":
undefined wildcard applied to 'car' ==> matches only "car"
The colors parameter is a reference to an array of CSS
color identificators, that are used to highlight the corresponding words
in the words array.
Default Google-like colors are used if you don't specify your
own colors. Number of colors can be lower than number of words - in this
case the colors are rotated and some of the words are therefore
highlighted using the same color.
The highlighter takes HTML tags into account and therefore
does not "highlight" a word or a pattern inside a tag.
A support for diacritics insenstive matching for ISO-8859-2
languages (for for example the czech language) can be activated using
the czech_language option. This feature requires a module
CzFast that is available on CPAN in a directory of author TRIPIE
or at http://geocities.com/tripiecz/.
Your system's locales must be set correctly to use the
czech_language feature.
- highlight
-
my $hl_document = $hl->highlight($document);
The only parameter is a document in that you want to highlight
the words that were passed to the constructor of the highlighter object.
The method returns a version of the document in which the words are
highlighted.
- preview_context
-
my $sections = $hl->preview_context($document, $num);
This method takes two parameters. The first one is the
document you want to scan for the words that were passed to the
constructor of the highlighter object. The second parameter is an
optional integer that specifies maximum number of characters in each of
the context sections (see below). This parameter defaults to 80
characters if it's not specified. Minimum allowed value of this
parameter is 60.
The method returns a reference to an array of sections of the
document in which the words that were passed to the constructor appear.
HTML tags are removed before the document is proccessed and are not
present in the ouput. This feature is typically used in search engines
to preview a context in which words from a search query appear in the
resulting documents. The words are always in the middle of each of the
sections. The number of sections this method returns is equal to the
number of words passed to the constructor of the highlighter object.
That means only the first occurence of each of the words is taken into
account.
No official support is provided, but I welcome any comments, patches and
suggestions on my email.
http://geocities.com/tripiecz/
Tomas Styblo, tripie@cpan.org, CPAN-ID TRIPIE
Prague, the Czech republic
HTML::Highlight - A module to highlight words or patterns in HTML documents
Copyright (C) 2000 Tomas Styblo (tripie@cpan.org)
This module is free software; you can redistribute it and/or
modify it under the terms of either:
a) the GNU General Public License as published by the Free
Software Foundation; either version 1, or (at your option) any later
version, or
b) the "Artistic License" which comes with this
module.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either the GNU
General Public License or the Artistic License for more details.
You should have received a copy of the Artistic License with this
module, in the file Artistic. If not, I'll be glad to provide one.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Hey! The above document had some coding errors, which are explained
below:
- Around line 251:
- '=item' outside of any '=over'
- Around line 344:
- You forgot a '=back' before '=head1'