|
|
| |
DelimMatch(3) |
User Contributed Perl Documentation |
DelimMatch(3) |
Text::DelimMatch - Perl extension to find regexp delimited strings with proper
nesting
use Text::DelimMatch;
$mc = new Text::DelimMatch, $startdelim, $enddelim;
$mc->quote('"');
$mc->escape("\\");
$mc->double_escape('"');
$mc->case_sensitive(1);
($prefix, $match, $remainder) = $mc->match($string);
($prefix, $nextmatch, $remainder) = $mc->match();
$middle = $mc->strip_delim($match); # returns $match w/o start and end delim
These routines allow you to match delimited substrings in a buffer. The
delimiters can be specified with any regular expression and the start and end
delimiters need not be the same. If the delimited text is properly nested,
entire nested groups are returned.
In addition, you may specify quoting and escaping characters that
contribute to the recognition of start and end delimiters.
For example, if you specify the start and end delimiters as '\('
and '\)', respectively, and the double quote character as a quoting
character, and the backslash as an escaping character, then the delimited
substring in this buffer is "(ma(t)c\)h)":
'prefix text "(quoted text)" \(escaped \" text) (ma(t)c\)h) postfix text'
In order to support this rather complex interface, the matching
context is encapsulated in an object. The object, Text::DelimMatch, has the
following public methods:
- new $start, $end, $escape, $dblesc, $qs1, $qe1, ... $qsn, $qen
- Creates a new object. All of the arguments are optional, and can be set
with other methods, but they must be passed in the specified order: start
delimiter, end delimiter, escape characters, double escape characters, and
a set of quote characters.
- match $string
- In an array context, returns ($pre, $match,
$post) where $pre is the
text preceding the first match, $match is the
matched text (including the delimiters), and $post
is the rest of the text in the buffer. In a scalar context, returns
$match.
If $string is not provided on
subsequent calls, the $post from the previous
match is used, unless keep is false. If keep is false, the match always
fails.
- strip_delim $string
- Returns $string with the start and end delimiters
removed.
- delim $start, $end
- Set the start and end delimiters. Only one set of delimiters can be in use
at any one time.
Returns the delimters in use before this call.
- quote $startq, $endq
- Specifies the start and end quote characters. Multiple quote character
pairs are supported, so this function is additive. To clear the current
settings, pass no arguments, e.g.,
$mc->quote().
If only $start is passed,
$end is assumed to be the same.
In matching, quotes occur in pairs. In other words, if
(",") and (',') are both specified as quote pairs and a string
beginning with " is found, it is ended only by another ", not
by '.
Returns the quote hash in use before this call.
- escape $esc
- Specifies a set of escaping characters. This can only be a string of
characters. $esc can be a regexp set or a simple
string. If it is a simple string, it will be translated into the regexp
set "[ quotemeta($esc) ]".
Returns the escape characters in use before this call.
- double_escape $esc
- Specifies a set of double-escaping characters, i.e., characters that are
considered escaped if they occur in pairs. For example, in some languages,
'Don''t you see?'
defines a string containing a single apostrophe.
$esc can only be a string of
characters. $esc can be a regexp set or a simple
string. If it is a simple string, it will be translated into the regexp
set "[ quotemeta($esc) ]".
Returns the double-escaping characters in use before this
call.
- case_sensitive $bool
- Sets case sensitivity to $bool or true if
$bool is not specified.
Returns the case sensitivity in use before this call.
- keep $bool
- Sets keep to $bool or true if
$bool is not specified.
Keep, which is true by default, specifies whether or not the
matching context object keeps a local copy of the buffer used in
matching. Keeping a local copy allows repeated matching on the same
buffer, but might be a bad idea if the buffer is a terabyte long.
;-)
Returns the keep setting in use before this call.
- returndelim $bool
- Sets returndelim to $bool or true if
$bool is not specified.
Returndelim, which is true by default, specifies whether or
not the start and end delimiters are returned with the matching
string.
Returns the returndelim setting in use before this call.
- error $seterr
- Returns the last error that occured. If $seterr is
passed, the error is set to that value. Some common kinds of bad input are
detected and an error condition is raised. If an error condition is
raised, all matching fails until the error is cleared.
The most common error is a bad regular expression, for example
specifing the start delimiter as "(" instead of
"\\(". Remember, these are regexps!
- pre_matched
- Returns the prefix text from the last match if keep is true. Sets an error
and returns an empty string if keep is false.
- matched
- Returns the matched text from the last match if keep is true. Sets an
error and returns an empty string if keep is false.
- post_matched
- Returns the postfix text from the last match if keep is true. Sets an
error and returns an empty string if keep is false.
- debug $bool
- Sets debug to $bool or true if
$bool is not specified.
If debug is true, informative and progress messages are
printed to STDOUT by some methods.
Returns the debugging setting in use before this call.
- dump
- For debugging, prints all of the instance variables for a particular
object.
- slow $bool
- For debugging. Some classes of delimited strings can be located with much
faster algorithms than can be used in the most general case. If slow is
true, the slower, general algorithm is always used.
For simplicity, and backward compatibility with the previous
(limited release) incarnation of this module, the following functions are
also available directly:
- nested_match ($string, $start, $end, $three)
- If $three is true, returns ($pre,
$match, $post) in an array
context otherwise returns ("$pre$match",
$post). In a scalar context, returns
"$pre$match".
- skip_nested_match ($string, $start, $end, $three)
- If $three is true, returns ($pre,
$match, $post) in an array
context otherwise returns ("$pre$match",
$post). In a scalar context, returns
$post.
$mc = new Text::DelimMatch '"';
$mc->('pre "match" post') == '"match"';
$mc->delim("\\(", "\\)");
$mc->('pre (match) post') == ('pre ', '(match)', ' post');
$mc->('pre (ma(t)ch) post') == ('pre ', '(ma(t)ch)', ' post');
$mc->quote('"');
$mc->escape("\\");
$mc->('pre (ma")"tch) post') == ('pre ', '(ma")"tch)', ' post');
$mc->('pre (ma(t)c\)h\") post') == ('pre ', '(ma(t)c\)h\")', ' post');
See also test.pl in the distribution.
Norman Walsh, ndw@nwalsh.com
Copyright (C) 1997-2002 Norman Walsh. All rights reserved. This program is free
software; you can redistribute it and/or modify it under the same terms as
Perl itself.
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |