use regex const compile : (re : byte[:] ->
std.error(regex#, status)) const dbgcompile : (re : byte[:] ->
std.error(regex#, status)) const free : (re : regex# -> void)
const exec : (re : regex#, str : byte[:] -> bool) const search :
(re : regex#, str : byte[:] -> bool)
The regex library provides functions for compiling and evaluating
regular expressions, as described later in this document, or in
myr-regex(7).
regex.compile will take a string describing a regex, and will
attempt to compile it, returing `std.Success regex# if the regex
is valid, and there were no error conditions encountered during compilation.
If the compilation failed, `std.Failure regex.status will be
returned, where regex.status is a failure code.
regex.dbgcompile is identical to regex.compile,
however, it will print debugging information as it compiles, and each time
the regex is evaluated.
regex.exec will take the regex passed to it, and evaluate
it over the text provided, returning the `std.Some matches, or
`std.None if there were no matches found. The matches must span the
whole string.
regex.search is similar to regex.exec, but it will attempt
to find a match somewhere within the string, instead of attempting to find a
match spanning the whole string.
The grammar used by libregex is below:
regex : altexpr
altexpr : catexpr ('|' altexpr)+
catexpr : repexpr (catexpr)+
repexpr : baseexpr[*+?]
baseexpr : literal
| charclass
| charrange
| escaped
| '.'
| '^'
| '$'
| '(' regex ')'
charclass : see below
charrange : '[' (literal('-' literal)?)+']'
The following metacharacters have the meanings listed below:
- Matches a single unicode
character
- ^
- Matches the beginning of a line. Does not consume any characters.
- $
- Matches the end of a line. Does not consume any characters.
- *
- Matches any number of repetitions of the preceding regex fragment.
- *?
- Reluctantly matches any number of repetitions of the preceding regex
fragment.
- +
- Matches one or more repetitions of the preceding regex fragment.
- +?
- Reluctantly matches one or more repetitions of the preceding regex
fragment.
- ?
- Matches zero or one of the preceding regex fragment.
In order to match a literal metacharacter, it needs to be preceded
by a '\' character.
The following character classes are supported:
- \d
- ASCII digits
- \D
- Negation of ASCII digits
- \x
- ASCII Hex digits
- \X
- Negation of ASCII Hex digits
- \s
- ASCII spaces
- \S
- Negation of ASCII spaces
- \w
- ASCII word characters
- \W
- Negation of ASCII word characters
- \h
- ASCII whitespace characters
- \H
- Negation of ASCII whitespace characters
- \pX, \p{X}
- Characters with unicode property 'X'
- \PX, \P{X}
- Negation of characters with unicode property 'X'
Unicode properties that are supported are listed below:
- L, Letter
- Unicode letter property
- Lu, Uppercase_Letter
- Uppercase letter unicode property
- Ll, Lowercase_Letter
- Lowercase letter unicode property
- Lt, Titlecase_Letter
- Titlecase letter unicode property
- N, Number
- Number unicode property
- Z, Separator
- Any separator character unicode property
- Zs, Space_Separator
- Space separator unicode property
use std
use regex
const main = {
var i
match regex.compile(pat)
| `std.Ok re:
match regex.exec(re, text)
| `std.Some matches:
for i = 0; i < matches.len; i++
std.put("Match {}: {}0, i, matches[i])
;;
| `std.None: std.put("Text did not match0)
;;
| `std.Err err:
std.put("failed to compile regex")
;;
}
The source code for this compiler is available from
git://git.eigenstate.org/git/ori/libregex.git
This code is insufficiently tested.
This code does not support all of the regex features that one
would expect.