|
NAMEText::Highlight - Syntax highlighting frameworkSYNOPSISuse Text::Highlight 'preload'; my $th = new Text::Highlight(wrapper => "<pre>%s</pre>\n"); print $th->highlight('Perl', $code); DESCRIPTIONText::Highlight is a flexible and extensible tool for highlighting the syntax in programming code. The markup used and languages supported are completely customizable. It can output highlighted code for embedding in HTML, terminal escapes for an ANSI-capable display, or even posting on an online forum. Bundled support includes C/C++, CSS, HTML, Java, Perl, PHP and SQL.INSTALLATIONIn order to install and use this package you will need Perl version 5.005 or better.Installation as usual: % perl Makefile.PL % make % make test % su Password: ******* % make install DEPENDENCIESNo thirdy-part modules are required.Following modules are optional
API OVERVIEW[Todo]METHODSText::Highlight provides an object oriented interface described in this section. Optionally, new can take the same parameters as the "configure" method described below."my $th = new Text::Highlight(
%args )"
Public Methods:"$th->configure( %args )"Sets the method used to output highlighted code. Any
combination of the following properties can be passed at once. If any option
is invalid (such as the wrapper containing no %s), a
note of such is "cluck"ed out to STDERR and
is otherwise silently ignored.
"wrapper => '<pre>%s</pre>'" An sprintf-style format string that the entire code is
passed through when it's completed. It must include a single
%s and any other optional formatting. If you do not
want any wrapper, just the highlighted code, set this to a simple '%s'. Also,
be aware that since this is an sprintf format, you must be careful of other %
characters in the format. Include only a single '%s' in the format for the
highlighted code. Refer to "sprintf" in perlfunc.
"markup => '<span class="%s">%s</span>'" Another sprintf format string, this one's for the markup
of individual semantic pieces of the highlighted code. In short, it's what
makes a comment turn green. The format contains two '%s' strings. The first is
the markup identifier from the "colors" hash
for the type of snippet that's being marked up. The second is the actual
snippet being marked up. A comment may look like
"<span class="comment">#me
comment</span>" as final output.
The limitation of this is that the identifier for the type must come before the code itself. Normally, this is the way markup works, but if you have something that won't, you're out of luck for the immediate time being. Future versions may include support for setting a coderef to get around it. "colors => \%hash" The default colors hash is:
{ comment => 'comment', string => 'string', number => 'number', key1 => 'key1', key2 => 'key2', key3 => 'key3', key4 => 'key4', key5 => 'key5', key6 => 'key6', key7 => 'key7', key8 => 'key8', }; This is the name to semantic markup token mapping hash. The parser breaks up code into semantic chunks denoted by the name keys. What gets passed through the above "markup"'s format is the value set at each key. This can hold things like raw color values, ANSI terminal escapes, or, the default, CSS classes. "escape => \&escape_sub | 'default' | undef" Every bit of displayed code is passed through an escape
function customizable for the output medium.
"$escaped_string = escapeHTML("unescaped
string")" If set to a code reference, it will be called for
every piece of code. This gets called a lot, so if you're concerned with
performance, take care that the function is pretty lightweight.
The default function does a minimal HTML escape, only the three & < and > characters are escaped. If you desire a more robust HTML escape, it has the same prototype as HTML::Entity's "encode_entities()" and CGI's "escapeHTML()". If you change the escape routine and want to change it back to the default, just set it to the literal string 'default'. A third option is no escaping at all and can be set by passing "undef". "vb => 1", "tgml => 1", "ansi => 1" When true, it sets the format, wrapper, escape, and
colors to that of the specified markup. When
"vb" is true, it sets values for posting in
vBulletin. For "tgml" it's good at Tek-Tips.
For "ansi" it's good for display in a
terminal that accepts ANSI color escapes.
Note, if more than one of these is present in a given call to "configure", it is indeterminite as to which one gets set. Also, if wrapper, markup, colors, or escape is passed along with vb, tgml, or ansi, it does not get overwritten. Hence, "$th->configure(wrapper => '[tt]%s[/tt]', tgml => 1)" will set the stored TGML settings for markups, colors, and escape, but will use the custom wrapper passed in instead of the value stored for TGML. "$code = $th->highlight($type, $code, $options)" "$code = $th->highlight(type => $type, code => $code, options => $options)" The "highlight" method
is the one that does all the work. Given at least the
"type" and original
"code", it will mark-up and return a string
with the highlighted code. It takes named parameters as listed below, or just
their values as a flat array in the order listed below. Order is subject to
change, so you're probably safer using the hash syntax.
"type => $type" The "type" passed in is
the name of the type of code. This can either be a type loaded from
"get_syntax" or is the name of a sub-module
that has a syntax or highlight method, ie
"Text::Highlight::$type".
"code => $code" "code" is the
unmarked-up, unescaped, plain-text code that needs to be highlighted.
"options => $options" "options" is optional
and mostly not needed. Some parsing modules can take extra configuration
options, so what "options" is can vary
greatly. Could be a string, a number, or a hashref of many options. The only
standard is if it is set to the string 'simple' in which case the
"highlight" method of the syntax module is
not called and Text::Highlight's local parsing method is used with the syntax
module's "syntax" hash.
"$code = $th->output" Returns the highlighted code from the last time the
"highlight" method was called.
"$th->get_syntax($type, $grammar, $format, $force)" "$th->get_syntax(type => $type, grammar => $grammar, format => $format, force => $force)" In addition to the existing T::H:: sub-modules, you can
specify new ones at runtime via text editor syntax files. Current support is
for EditPlus and UltraEdit (both very good text/code editors). Many users make
these files available on the web and shouldn't be difficult to find. This
method can also be used to load an already parsed language syntax hash if, for
whatever reason, you don't want to make them into modules.
This method returns a hashref to the parsed syntax if successful, or undef and a clucked error message if not. You can use the returned value as a simple truth test, or you can make your own static sub-module out of it and save reparsing time if you're using the same additional types often. See <a doc that doesn't yet exists> for details on creating a sub-module. The object keeps a copy of the new type and can be referenced in the highlight method for the object's life. "type => $type" The "type" is the same
that gets passed to "highlight", so whatever
is specified here must match the call there for use. Also, if the same type is
specified as one that already exits as a sub-module (visible in
@INC as Text::Highlight::$type), the syntax loaded via
"get_syntax" will take precedence.
"grammar => $filename | \%syntax" "grammar" can be one of
two things: the filename containing the syntax, or a hashref to an already
parsed language syntax. If a filename, the file must contain only a single
language syntax definition. Though some editors allow multiple language
defined in the same file, to be loaded here, it may contain only one. If a
hashref, it is assumed to be valid and no further checking is done.
"format => 'editplus' | 'ultraedit'" "format" is a string
specifying which format the syntax definition in the file is in. It is not
used if "grammar" is a hashref, but is
required if it is a filename. Currently, it must be set to one of the
following strings: 'editplus' 'ultraedit'
The syntax for a language is set to the following default hash before parsing the file. This means if any of the options are not set in the syntax file, the default specified here is used instead. If "format" is not set to a valid string, this default hash is also set and passed back instead of throwing an error. It will allow parsing to happen without error, but will not do anything to the code. { name => 'Unknown-type', escape => '\\', case => 1, continueQuote => 0, blockCommentOn => [], lineComment => [], quot => [], }; "force => 1" If "force" is set to a
true value, the grammar specified will always be reparsed, reset, and
reloaded. By default, if a grammar is loaded for a
"type" that has already been loaded, the
existing copy is used instead and no reparsing is done. This works as a very
simple cacheing mechanism so you don't have to worry about unneccessary
processing unless you want to.
Examples:Until I come up with some better examples, here's the defaults the module uses.$DEF_FORMAT = '<span class="%s">%s</span>'; $DEF_ESCAPE = \&_simple_html_escape; $DEF_WRAPPER = '<pre>%s</pre>'; $DEF_COLORS = { comment => 'comment', string => 'string', number => 'number', key1 => 'key1', key2 => 'key2', key3 => 'key3', key4 => 'key4', key5 => 'key5', key6 => 'key6', key7 => 'key7', key8 => 'key8', }; #sub is the same prototype as CGI.pm's escapeHTML() #and HTML::Entity's encode_entities() sub _simple_html_escape { my $code = shift; #escape the only three characters that "really" matter for displaying html $code =~ s/&/&/g; $code =~ s/</</g; $code =~ s/>/>/g; return $code; } API SYNTAX EXTENSIONS[Todo]EXAMPLES[Todo]TODO
AUTHORSAndrew Flerchinger <icrf [at] wdinc.org>Enrico Sorcinelli <enrico [at] sorcinelli.it> (main contributors) BUGSPlease submit bugs to CPAN RT system at <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Highlight> or by email at bug-text-highlight@rt.cpan.orgPatches are welcome and we'll update the module if any problems are found. VERSIONVersion 0.04SEE ALSOHTML::SyntaxHighlighter, perl(1)COPYRIGHT AND LICENSECopyright (C) 2001-2005. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. |