|
|
| |
MANDOC_ESCAPE(3) |
FreeBSD Library Functions Manual |
MANDOC_ESCAPE(3) |
mandoc_escape —
parse roff escape sequences
#include <sys/types.h>
#include <mandoc.h>
enum mandoc_esc
mandoc_escape (const char **end,
const char **start, int
*sz);
This function scans a
roff(7)
escape sequence.
An escape sequence consists of
- an initial backslash character (‘\’),
- a single ASCII character called the escape sequence identifier,
- and, with only a few exceptions, an argument.
Arguments can be given in the following forms; some escape
sequence identifiers only accept some of these forms as specified below. The
first three forms are called the standard forms.
- In brackets:
[ argument]
- The argument starts after the initial ‘[’, ends before the
final ‘]’, and the escape sequence ends with the final
‘]’.
- Two-character argument short form:
( ar
- This form can only be used for arguments consisting of exactly two
characters. It has the same effect as
[ ar] .
- One-character argument short form: a
- This form can only be used for arguments consisting of exactly one
character. It has the same effect as
[ a] .
- Delimited form:
CargumentC
- The argument starts after the initial delimiter character
C, ends before the next occurrence of the delimiter
character C, and the escape sequence ends with that
second C. Some escape sequences allow arbitrary
characters C as quoting characters, some restrict
the range of characters that can be used as quoting characters.
Upon function entry, end is expected to
point to the escape sequence identifier. The values passed in as
start and sz are ignored and
overwritten.
By design, this function cannot handle those
roff(7)
escape sequences that require in-place expansion, in particular user-defined
strings \* , number registers
\n , width measurements \w ,
and numerical expression control \B . These are
handled by roff_res (), a private preprocessor
function called from roff_parseln (), see the file
roff.c.
The function mandoc_escape () is used
- recursively by itself, because some escape sequence arguments can in turn
contain other escape sequences,
- for error detection internally by the
roff(7)
parser part of the
mandoc(3)
library, see the file roff.c,
- above all externally by the
mandoc(1)
formatting modules, in particular
-Tascii and
-Thtml , for formatting purposes, see the files
term.c and html.c,
- and rarely externally by high-level utilities using the mandoc library,
for example
makewhatis(8),
to purge escape sequences from text.
Upon function return, the pointer end is set to the
character after the end of the escape sequence, such that the calling
higher-level parser can easily continue.
For escape sequences taking an argument, the pointer
start is set to the beginning of the argument and
sz is set to the length of the argument. For escape
sequences not taking an argument, start is set to the
character after the end of the sequence and sz is set
to 0. Both start and sz may be
NULL ; in that case, the argument and the length are
not returned.
For sequences taking an argument, the function
mandoc_escape () returns one of the following
values:
ESCAPE_FONT
- The escape sequence
\f taking an argument in
standard form: \f[ , \f( ,
\f a. Two-character arguments
starting with the character ‘C’ are reduced to one-character
arguments by skipping the ‘C’. More specific values are
returned for the most commonly used arguments:
ESCAPE_SPECIAL
- The escape sequence
\C taking an argument
delimited with the single quote character and, as a special exception, the
escape sequences not having an identifier, that is,
those where the argument, in standard form, directly follows the initial
backslash: \C' , \[ ,
\( ,
\ a. Note that the
one-character argument short form can only be used for argument characters
that do not clash with escape sequence identifiers.
If the argument matches one of the forms described below under
ESCAPE_UNICODE , that value is returned
instead.
The ESCAPE_SPECIAL special character
escape sequences can be rendered using the functions
mchars_spec2cp () and
mchars_spec2str () described in the
mchars_alloc(3)
manual.
ESCAPE_UNICODE
- Escape sequences of the same format as described above under
ESCAPE_SPECIAL , but with an argument of the forms
u XXXX,
u YXXXX, or
u10 XXXX where
X and Y are hexadecimal digits
and Y is not zero: \C'u ,
\[u . As a special exception,
start is set to the character after the
u , and the sz return value
does not include the u either.
Such Unicode character escape sequences can be rendered using
the function mchars_num2uc () described in the
mchars_alloc(3)
manual.
ESCAPE_NUMBERED
- The escape sequence
\N followed by a delimited
argument. The delimiter character is arbitrary except that digits cannot
be used. If a digit is encountered instead of the opening delimiter, that
digit is considered to be the argument and the end of the sequence, and
ESCAPE_IGNORE is returned.
Such ASCII character escape sequences can be rendered using
the function mchars_num2char () described in the
mchars_alloc(3)
manual.
ESCAPE_OVERSTRIKE
- The escape sequence
\o followed by an argument
delimited by an arbitrary character.
ESCAPE_IGNORE
-
- The escape sequence
\s followed by an argument
in standard form or by an argument delimited by the single quote
character: \s' , \s[ ,
\s( ,
\s a. As a special
exception, an optional ‘+’ or ‘-’
character is allowed after the ‘s’ for all forms.
- The escape sequences
\F ,
\g , \k ,
\M , \m ,
\n , \V , and
\Y followed by an argument in standard
form.
- The escape sequences
\A ,
\b , \D ,
\R , \X , and
\Z followed by an argument delimited by an
arbitrary character.
- The escape sequences
\H ,
\h , \L ,
\l , \S ,
\v , and \x followed by
an argument delimited by a character that cannot occur in numerical
expressions. However, if any character that can occur in numerical
expressions is found instead of a delimiter, the sequence is
considered to end with that character, and
ESCAPE_ERROR is returned.
ESCAPE_ERROR
- Escape sequences taking an argument but not matching any of the above
patterns. In particular, that happens if the end of the logical input line
is reached before the end of the argument.
For sequences that do not take an argument, the function
mandoc_escape () returns one of the following
values:
ESCAPE_SKIPCHAR
- The escape sequence “\z”.
ESCAPE_NOSPACE
- The escape sequence “\c”.
ESCAPE_IGNORE
- The escape sequences “\d” and “\u”.
This function is implemented in mandoc.c.
This function has been available since mandoc 1.11.2.
The function doesn't cleanly distinguish between sequences that are valid and
supported, valid and ignored, valid and unsupported, syntactically invalid, or
undefined. For sequences that are ignored or unsupported, it doesn't tell
whether that deficiency is likely to cause major formatting problems and/or
loss of document content. The function is already rather complicated and still
parses some sequences incorrectly.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |