|
|
| |
PerlPoint::Parser(3) |
User Contributed Perl Documentation |
PerlPoint::Parser(3) |
PerlPoint::Parser - a PerlPoint Parser
This manual describes version 0.451.
# load the module:
use PerlPoint::Parser;
# build the parser and run it
# to get intermediate data in @stream
my ($parser)=new PerlPoint::Parser;
$parser->run(
stream => \@stream,
files => \@files,
);
The PerlPoint format, initially designed by Tom Christiansen, is intended to
provide a simple and portable way to generate slides without the need of a
proprietary product. Slides can be prepared in a text editor of your choice,
generated on any platform where you find perl, and presented by any browser
which can render the chosen output format.
To sum it up, PerlPoint Software takes an ASCII text and
transforms it into slides written in a certain document description
language. This is, by tradition, usually HTML, but you may decide to use
another format like XML, SGML, TeX or whatever you want.
Well, this sounds fine, but how to build a translator which
transforms ASCII into the output format of your choice? Thats what
PerlPoint::Parser is made for. It performs the first translation step
by parsing ASCII and transforming it into an intermediate stream format,
which can be processed by a subsequently called translator backend. By
separating parsing and output generation we get the flexibility to write as
many backends as necessary by using the same parser frontend for all
translators.
PerlPoint::Parser supports the complete GRAMMAR with
exception of certain tags. Tags are supported the most
common way: the parser recognizes any tag which is declared by
the author of a translator. This way the parser can be used for various
flavours of the PerlPoint language without having to be modified. So, if
there is a need of a certain new flag, it can quickly be added without any
change to PerlPoint::Parser.
The following chapters describe the input format (GRAMMAR)
and the generated stream format (STREAM FORMAT). Finally, the class
methods are described to show you how to build a parser.
This chapter describes how a PerlPoint ASCII slide description has to be
formatted to pass PerlPoint::Parser parsers.
Please note that the input format does not
completely determine how the output will be designed. The final
format depends on the backend which has to be called after the parser
to transform its output into a certain document description language. The
final appearance depends on the browsers behaviour.
Each PerlPoint document is made of paragraphs.
All paragraphs start at the beginning of their first line. The first character
or string in this line determines which paragraph is recognized.
A paragraph is completed by an empty line (which may contain
whitespaces). Exceptions are described.
Carriage returns in paragraphs which are completed by an empty
line are transformed into a whitespace.
- Comments
- start with "//" and reach until the end of the line.
- Headlines
- start with one or more "=" characters. The number of
"=" characters represents the headline level.
=First level headline
==Second level headline
===Multi
line
headline
example
It is possible to declare a "short version" of the
headline title by appending a "~" and plain strings to the
headline like in
=Very long headlines are expressive but may exceed the
available space for example in HTML navigation bars or
something like that ~ Long headlines
The "~" often stands for similarity, or represents
the described object in encyclopedias or dictionaries. So one may think
of this as "long title is (sometimes) similar to short
title".
- Lists
- Points or unordered lists start with a "*"
character.
* This is a first point.
* And, I forgot,
there is something more to point out.
There are ordered lists as well, and they start
with a hash sign ("#"):
# First, check the number of this.
# Second, don't forget the first.
The hash signs are intended to be replaced by numbers by a
backend.
Because PerlPoint works on base of paragraphs, any paragraph
different to an ordered list point closes an ordered list. If you
wish the list to be continued use a double hash sign in case of the
single one in the point that reopens the list.
# Here the ordered list begins.
? $includeMore
## This is point 2 of the list that started before.
# In subsequent points, the usual single hash sign
works as expected again.
List continuation works list level specific (see below for
level details). A list cannot be continued in another chapter. Using
"##" in the first point of a new list takes no special effect:
the list will begin as usual (with number 1).
Definition lists are a third list variant. Each item
starts with the described phrase enclosed by a pair of colons, followed
by the definition text:
:first things: are usually described first,
:others: later then.
All lists can be nested. A new level is introduced by a
special paragraph called "list indention" which starts
with a ">". A list level can be terminated by a
"list indention stop" paragraph starting with a
"<" character. (These startup characters symbolize
"level shifts".)
* First level.
* Still there.
>
* A list point of the 2nd level.
<
* Back on first level.
It is possible to shift more than one level by adding a
number. There should be no whitespace between the level shift character
and the level number.
* First level.
>
* Second level.
>
* Third level.
<2
* Back on first level.
Level shifts are accepted between list items only.
Please note that there is no need to shift levels back if a
list is completed. Any non list paragraph will reset list
indentation, as well as the end of the source.
- Texts
- are paragraphs like points but begin immediately without a startup
character:
This is a simple text.
In this new text paragraph,
we demonstrate the multiline feature.
Optionally, a text paragraph can be started with a
special character as well, which is a dot:
.This is a simple text with dot.
.In this new text paragraph,
we demonstrate the multiline feature.
This is intended to be used by generators which translate
other formats into PerlPoint, to make sure the first character of a
paragraph has no special meaning to the PerlPoint parser.
- Blocks
- are intended to contain examples or code with tag recognition. This
means that the parser will discover embedded tags. On the other hand, it
means that one may have to escape ">" characters embedded
into tags. Blocks begin with an indentation and are completed by
the next empty line.
* Look at these examples:
A block.
\I<Another> block.
Escape ">" in tags: \C<<\>>.
Examples completed.
Subsequent blocks are joined together automatically: the
intermediate empty lines which would usually complete a block are
translated into real empty lines within the block. This makes it
easier to integrate real code sequences as one block, regardless of the
empty lines included. However, one may explicitly wish to
separate subsequent blocks and can do so by delimiting them by a special
control paragraph:
* Separated subsequent blocks:
The first block.
-
The second block.
Note that the control paragraph starts at the left margin.
- Verbatim blocks
- are similar to blocks in indentation but deactivate pattern
recognition. That means the embedded text is not scanned for tags
and empty lines and may therefore remain as it was in its original place,
possibly a script.
These special blocks need a special syntax. They are
implemented as here documents. Start with a here document clause
flagging which string will close the "here document":
<<EOC
PerlPoint knows various
tags like \B, \C and \I. # unrecognized tags
EOC
- Tables
- are supported as well, they start with an @ sign which is followed by the
column delimiter:
@|
column 1 | column 2 | column 3
aaa | bbb | ccc
uuu | vvvv | www
The first line is automatically marked as a "table
headline". Most converters emphasize such headlines by bold
formatting, so there is no need to insert \B tags into the document.
If a table row contains less columns than the table headline,
the "missed" columns are automatically added. This is,
@|
A | B | C
1
1 |
1 | 2
1 | 2 |
1 | 2 | 3
is streamed exactly like
@|
A | B | C
1 | |
1 | |
1 | 2 |
1 | 2 |
1 | 2 | 3
to make backend handling easier. (Empty HTML table cells, for
example, are rendered slightly obscure by certain browsers unless they
are filled with invisible characters, so a converter to HTML can detect
such cells because of normalization and handle them appropriately.)
Please note that normalization refers to the headline row. If
another line contains more columns than the headline,
normalization does not care. If the maximum column number is detected in
another row, a warning is issued. (As a help for converter authors, the
title and maximum column number are made part of a table tag as internal
options "__titleColumns__" and
"__maxColumns__".)
In all tables, leading and trailing whitespaces of a cell are
automatically removed, so you can use as many of them as you want to
improve the readability of your source. The following table is
absolutely equivalent to the last example:
@|
A | B | C
1 | |
1 | |
1 | 2 |
1 | 2 |
1 | 2 | 3
There is also a more sophisticated way to describe tables, see
the tag section below.
Note: Although table paragraphs cannot be nested, tables
declared by tag possibly can (and might be embedded into table
paragraphs as well). To help converter authors handling nested tables,
the opening table tag provides an internal option
"__nestingLevel__".
- Conditions
- start with a "?" character. If active contents is enabled, the
paragraph text is evaluated as Perl code. The (boolean) evaluation result
then determines if subsequent PerlPoint is read and parsed. If the result
is false, all subsequent paragraphs until the next condition are
skipped.
Note that base data is made available by a global (package)
hash reference $PerlPoint. See
run() for details about how to set up these
data.
Conditions can be used to maintain various language versions
of a presentation in one source file:
? $PerlPoint->{targetLanguage} eq 'German'
Or you could enable parts of your document by date:
? time>$dateOfTalk
or by a special setting:
? flagSet('setting')
Please note that the condition code shares its variables with
embedded and included code.
To make usage easier and to improve readability, condition
code is evaluated with disabled warnings (the language variable in the
example above may not even been set).
Converter authors might want to provide predefined variables
such as "$language" in the example.
Note: If a document uses document streams, be careful
in intermixing docstream entry points and conditions. A condition placed
in a skipped document stream will not e evaluated. A document stream
entry point placed in a source area hidden by a false condition will not
be reconized.
- Variable assignment paragraphs
- Variables can be used in the text and will be automatically replaced by
their string values (if declared).
The next paragraph sets a variable.
$var=var
This variable is called $var.
All variables are made available to embedded and included Perl
code as well as to conditions and can be accessed there as package
variables of "main::" (or whatever package name the Safe
object is set up to). Because a variable is already replaced by the
parser if possible, you have to use the fully qualified name or to guard
the variables "$" prefix character to do so:
\EMBED{lang=perl}join(' ', $main::var, \$var)\END_EMBED
Variable modifications by embedded or included Perl do
not affect the variables visible to the parser. (This is true for
conditions as well.) This means that
$var=10
\EMBED{lang=perl}$main::var*=2;\END_EMBED
causes $var to be different on parser and
code side - the parser will still use a value of 10, while embedded code
works on with a value of 20.
- Macro or alias definitions
- Sometimes certain text parts are used more than once. It would be a
relieve to have a shortcut instead of having to insert them again and
again. The same is true for tag combinations a user may prefer to use.
That's what aliases (or "macros") are designed for. They
allow a presentation author to declare his own shortcuts and to use them
like a tag. The parser will resolve such aliases, replace them by the
defined replacement text and work on with this replacement.
An alias declaration starts with a "+" character
followed immediately by the alias name (without backslash
prefix), optionally followed immediately by an option default
list in "{}", followed immediately by a colon. (No
additional spaces here.)
All text after this colon up to the paragraph closing empty
line is stored as the replacement text. So, whereever you will use
the new macro, the parser will replace it by this text and
reparse the result. This means that your macro text can contain
any valid constructions like tags or other macros.
The replacement text may contain strings embedded into doubled
underscores like "__this__". This is a
special syntax to mark that the macro takes parameters of these names
(e.g. "this"). If a macro is used and
these parameters are set, their values will replace the mentioned
placeholders. The special placeholder "__body__" is used to
mark where the macro body is to place.
If a macro is used and defined options are unset, but
there are defaults for them in the optional default list, these defaults
will be used for the respective options.
Here are a few examples:
+RED:\FONT{color=red}<__body__>
+F:\FONT{color=__c__}<__body__>
+COLORED{c=blue}:\FONT{color=__c__}<__body__>
+IB:\B<\I<__body__>>
This \IB<text> is \RED<colored>.
Defaults: first, text in \COLORED{c=red}<Red>,
now text in \COLORED<Blue>.
+TEXT:Macros can be used to abbreviate longer
texts as well as other tags
or tag combinations.
+HTML:\EMBED{lang=html}
Tags can be \RED<\I<nested>> into macros.
And \I<\F{c=blue}<vice versa>>.
\IB<\RED<This>> is formatted by nested macros.
\HTML This is <i>embedded HTML</i>\END_EMBED.
Please note: \TEXT
If no parameter is defined in the macro definition, options
will not be recognized. The same is true for the body part.
Unless "__body__" is used in the macro
definition, macro bodies will not be recognized. This means that
with the definition
+OPTIONLESS:\B<__body__>
the construction
\OPTIONLESS{something=this}<more>
is evaluated as a usage of
"\OPTIONLESS" without body, followed
by the string
"{something=here}". Likewise, the
definition
+BODYLESS:found __something__
causes
\BODYLESS{something=this}<more>
to be recognized as a usage of
"\BODYLESS" with option
"something", followed by the
string "<more">. So this
will be resolved as "found this".
Finally,
+JUSTTHENAME:Text phrase.
enforces these constructions
... \JUSTTHENAME, ...
... \JUSTTHENAME{name=Name}, ...
... \JUSTTHENAME<text>, ...
... \JUSTTHENAME{name=Name}<text> ...
to be translated into
... Text phrase. ...
... Text phrase.{name=Name} ...
... Text phrase.<text>, ...
... Text phrase.{name=Name}<text> ...
The principle behind all this is to make macro usage
easier and intuative: why think of options or a body or of
special characters possibly treated as option/body part openers unless
the macro makes use of an option or body?
An empty macro text undefines the macro (if it
was already known).
// undeclare the IB alias
+IB:
An alias can be used like a tag.
Aliases named like a tag overwrite the tag (as long as
they are defined).
- Document stream entry points
- A document stream is a "document in document" and best explained
by example.
Consider a document talking about
two scripts and comparing them. A
typical review of this type is
structured this way: headline, notes
about script 1, notes about script 2,
new headline to discuss another aspect,
notes about script 1, notes about
script 2, and so on.
Everything said about item 1 is a document stream, everything
about object 2 as well. and a third stream is implicitly built by all
parts outside these two. In slide construction, each stream can have its
own area, for example
-------------------------------------
| |
| main stream |
| |
-------------------------------------
| | |
| item 1 stream | item 2 stream |
| | |
-------------------------------------
But to construct a layout like this, streams need to be
distinguished, and that is what "stream entry points" are made
for.
A stream entry point starts with a "~" character,
followed by a string which is the name of the stream. This may be an
internal name only, or converters may turn it into a document part as
well. The "__ALL__" string is reserved
for internal purposes. It is recommended to treat
"__MAIN__" as reserved as well,
although it has no special meaning yet.
Once an entry point was passed, all subsequent document parts
belong to the declared stream, up to the next entry point or a headline
which implicitly switches back to the "main stream".
The parser can be instructed to ignore certain streams, see
run() for details. If this feature is used,
please be careful in intermixing stream entry points and conditions. A
condition placed in a skipped document stream will not be evaluated.
It is up to a converter how document streams are used.
Certain converters may ignore them at all. As a convenient solution, the
parser can be instructed to transform stream entry points into headlines
(one level below the current real headline level). See
run() for details.
Tags are directives embedded into the text stream, commanding how certain parts
of the text should be interpreted. Tags are declared by using one or more
modules build on base of PerlPoint::Tags.
use PerlPoint::Tags::Basic;
PerlPoint::Parser parsers can recognize all tags which are
build of a backslash and a number of capitals and numbers.
\TAG
Tag options are optional and follow the tag name
immediately, enclosed by a pair of corresponding curly braces. Each option
is a simple string assignment. The value has to be quoted if /^\w+$/ does
not match it.
\TAG{par1=value1 par2="www.perl.com" par3="words and blanks"}
The tag body is anything you want to make the tag valid
for. It is optional as well and immediately follows the optional parameters,
enclosed by "<" and ">":
\TAG<body>
\TAG{par=value}<body>
Tags can be nested.
To provide a maximum of flexibility, tags are declared
outside the parser. This way a translator programmer is free to
implement the tags he needs. It is recommended to always support the basic
tags declared by PerlPoint::Tags::Basic. On the other hand,a few tags
of special meaning are reserved and cannot be declared by converter authors,
because they are handled by the parser itself. These are:
- \INCLUDE
- It is possible to include a file into the input stream. Have a look:
\INCLUDE{type=HTML file=filename}
This imports the file "filename". The file contents
is made part of the generated stream, but not parsed. This is useful to
include target language specific, preformatted parts.
If, however, the file type is specified as "PP", the
file contents is made part of the input stream and parsed. In this case
a special tag option "headlinebase" can be specified to define
a headline base level used as an offset to all headlines in the included
document. This makes it easier to share partial documents with others,
or to build complex documents by including separately maintained parts,
or to include one and the same part at different headline levels.
Example: If "\INCLUDE{type=PP file=file headlinebase=20}" is
specified and "file" contains a one level headline
like "=Main topic of special explanations"
this headline is detected with a level of 21.
Pass the special keyword "CURRENT_LEVEL" to this tag
option if you want to set just the current headline level as an
offset. This results in "subchapters".
Example:
===Headline 3
// let included chapters start on level 4
\INCLUDE{type=PP file=file headlinebase=CURRENT_LEVEL}
Similar to "CURRENT_LEVEL", "BASE_LEVEL"
sets the current base headline level as an offset. The "base
level" is the level above the current one. Using
"BASE_LEVEL" results in parallel chapters.
Example:
===Headline 3
// let included chapters start on level 3
\INCLUDE{type=PP file=file headlinebase=BASE_LEVEL}
A given offset is reset when the included document is parsed
completely.
A second special option smart commands the parser to
include the file only unless this was already done before. This is
intended for inclusion of pure alias/macro definition or variable
assignment files.
\INCLUDE{type=PP file="common-macros.pp" smart=1}
Included sources may declare variables of their own, possibly
overwriting already assigned values. Option "localize" works
like Perls "local()": such changes
will be reversed after the nested source will have been processed
completely, so the original values will be restored. You can specify a
comma separated list of variable names or the special string
"__ALL__" which flags that all
current settings shall be restored.
\INCLUDE{type=PP file="nested.pp" localize=myVar}
\INCLUDE{type=PP file="nested.pp" localize="var1, var2, var3"}
\INCLUDE{type=PP file="nested.pp" localize=__ALL__}
PerlPoint authors can declare an input filter to
preprocess the included file. This is done via option
ifilter:
\INCLUDE{type=pp file="source.pod" ifilter="pod2pp()"}
An input filter is a snippet of user defined Perl code, taking
the included file via @main::_ifilterText and
the target type via $main::_ifilterType. The
original filename can be accessed via
$main::_ifilterType. It should supply its result
as an array of strings which will then be processed instead of the
original file.
Input filters are Active Content. If Active Content is
disabled, \INCLUDE tags using input filters will be ignored
completely.
As a simplified option,
"import" allows to use
predefined import filters defined in
"PerlPoint::Import::..." modules. To
use such a filter do not set the
"ifilter" option, set
"import" instead.
"import" takes the name of the source
format, like "POD", or a true number to indicate that the file
extension should be used as the source format name. The uppercased name
is used as the final part of the filter module - for "POD",
the modules name would be "PerlPoint::Import::POD". If this
module is installed and has a function
"importFilter()" this function name is
used like "ifilter".
Here are a few examples:
\INCLUDE{file="source.pod" import=1}
\INCLUDE{file="source.pod" import=pod}
\INCLUDE{file=source import=pod}
Please note that in the last example
"import=1" will not work, as the
source file has no extension that indicates its format is POD.
If "ifilter" is used
together with "import",
"import" is ignored.
A PerlPoint file can be included wherever a tag is allowed,
but sometimes it has to be arranged slightly: if you place the inclusion
directive at the beginning of a new paragraph and your included
PerlPoint starts by a paragraph of another type than text, you should
begin the included file by an empty line to let the parser detect the
correct paragraph type. Here is an example: if the inclusion directive
is placed like
// include PerlPoint
\INCLUDE{type=pp file="file.pp"}
and file.pp immediately starts with a verbatim block like
<<VERBATIM
verbatim
VERBATIM
, the inclusion directive already opens a new paragraph
which is detected to be "text" (because there is no special
startup character). Now in the included file, from the parsers point of
view the included PerlPoint is simply a continuation of this text,
because a paragraph ends with an empty line. This trouble can be avoided
by beginning the included file by an empty line, so that its first
paragraph can be detected correctly.
The second special case is a file type of "Perl". If
active contents is enabled, included Perl code is read into memory and
evaluated like embedded Perl. The results are made part of the
input stream to be parsed.
// execute a perl script and include the results
\INCLUDE{type=perl file="disk-usage.pl"}
As another option, files may be declared to be of type
"example" or "parsedexample". This makes the file
placed into the source as a verbatim block (with "example"),
or a standard block (with "parsedexample"), respectively,
without need to copy its contents into the source.
// include an external script as an example
\INCLUDE{type=example file="script.csh"}
All lines of the example file are included as they are but can
be indented on request. To do so, just set the special option
"indent" to a positive numerical value equal to the number of
spaces to be inserted before each line.
// external example source, indented by 3 spaces
\INCLUDE{type=example file="script.csh" indent=3}
Including external scripts this way can accelerate PerlPoint
authoring significantly, especially if the included files are still
subject to changes.
It is possible to filter the file types you wish to include
(with exception of "pp" and "example"), see below
for details. In any case, the mentioned file has to exist.
- \EMBED and \END_EMBED
- Target format code does not necessarily need to be imported - it can be
directly embedded as well. This means that one can write target
language code within the input stream using \EMBED:
\EMBED{lang=HTML}
This is <i><b>embedded</b> HTML</i>.
The parser detects <i>no</i> PerlPoint
tag here, except of <b>END_EMBED</b>.
\END_EMBED
Because this is handled by tags, not by paragraphs, it
can be placed directly in a text like this:
These \EMBED{lang=HTML}<i>italics</i>\END_EMBED
are formatted by HTML code.
Please note that the EMBED tag does not accept a tag body (to
avoid ambiguities).
Both tag and embedded text are made part of the intermediate
stream. It is the backends task to deal with it. The only exception of
this rule is the embedding of Perl code, which is evaluated by
the parser. The reply of this code is made part of the input stream and
parsed as usual.
PerlPoint authors can declare an input filter to
preprocess the embedded text. This is done via option
ifilter:
\EMBED{lang=pp ifilter="pod2pp()"}
=head1 POD formatted part
This part was written in POD.
\END_EMBED
An input filter is a snippet of user defined Perl code, taking
the embedded text via @main::_ifilterText and
the target language via $main::_ifilterType. The
original filename can be accessed via
$main::_ifilterType (but please note that this
is the source with the \EMBED tag). It should supply its result as an
array of strings which will then be processed as usual.
Input filters are Active Contents. If Active Contents is
disabled, embedded parts using input filters will be ignored
completely.
It is possible to filter the languages you wish to embed (with
exception of "PP"), see below for details.
- \TABLE and \END_TABLE
- It was mentioned above that tables can be built by table paragraphs. Well,
there is a tag variant of this:
\TABLE{bg=blue separator="|" border=2}
\B<column 1> | \B<column 2> | \B<column 3>
aaaa | bbbb | cccc
uuuu | vvvv | wwww
\END_TABLE
This is sligthly more powerfull than the paragraph syntax: you
can set up several table features like the border width yourself, and
you can format the headlines as you like.
As in all tables, leading and trailing whitespaces of a cell
are automatically removed, so you can use as many of them as you want to
improve the readability of your source.
The default row separator (as in the example above) is a
carriage return, so that each table line can be written as a separate
source line. However, PerlPoint allows you to specify another string to
separate rows by option
"rowseparator". This allows to specify
a table inlined into a paragraph.
\TABLE{bg=blue separator="|" border=2 rowseparator="+++"}
\B<column 1> | \B<column 2> | \B<column 3> +++ aaaa
| bbbb | cccc +++ uuuu | vvvv| wwww \END_TABLE
This is exactly the same table as above.
If parser option nestedTables is set to a true value
calling run(), it is possible to nest
tables. To help converter authors handling this, the opening table tag
provides an internal option "__nestingLevel__".
Tables built by tag are normalized the same way as table
paragraphs are.
Earlier versions of pp2html supported special format hints like the HTML
expression ">" for the ">" character, or
"ü" for "ü". PerlPoint::Parser does
not support this directly because such hints are specific to the
output format - if someone wants to translate into TeX, it might be
curious for him to use HTML syntax in his ASCII text. Further more, such hints
can be handled completely by a backend which finds them unchanged in
the produced output stream.
The same is true for special headers and trailers. It is a
backend task to add them if necessary. The parser does handle the
input only.
It is suggested to use PerlPoint::Backend to evaluate the intermediate
format. Nevertheless, here is the documentation of this format.
The generated stream is an array of tokens. Most of them are very
simple, representing just their contents - words, spaces and so on.
Example:
"These three words."
could be streamed into
"These three" + " "+ "words."
(This shows the principle. Actually this complete sentence would
be replied as one token for reasons of effeciency.)
Note that the final dot is part of the last token. From a
document description view, this should make no difference, its just a string
containing special characters or not.
Well, besides this "main stream", there are
formatting directives. They flag the beginning or
completion of a certain logical entity - this means a whole document,
a paragraph or a formatting like italicising. Almost every entity is
embedded into a start and a completion directive - except of simple
tokens.
In the current implementation, a directive is a reference to an
array of mostly two fields: a directive constant showing which entity is
related, and a start or completion hint which is a constant, too. The used
constants are declared in PerlPoint::Constants. Directives can pass
additional informations by additional fields. By now, the headline
directives use this feature to show the headline level, as well as the tag
ones to provide tag type information and the document ones to keep the name
of the original document. Further more, ordered list points can
request a fix number this way.
# this example shows a tag directive
... [DIRECTIVE_TAG, DIRECTIVE_START, "I"]
+ "formatted" + " " + "strings"
+ [DIRECTIVE_TAG, DIRECTIVE_COMPLETE, "I"] ...
To recognize whether a token is a basic or a directive, the
ref() function can be used. However, this handling should be done by
PerlPoint::Backend transparently. The format may be subject to
changes and is documented for information purposes only.
Original line numbers are no part of the stream but can be
provided by embedded directives on request, see below for details.
This is the complete generator format. It is designed to be simple
but powerful.
The constructor builds and prepares a new parser object.
Parameters:
- The class name.
Return value: The new object in case of success.
Example:
my ($parser)=new PerlPoint::Parser;
This function starts the parser to process a number of specified files.
Parameters: All parameters except of the object
parameter are named (pass them by hash).
- activeBaseData
- This optional parameter allows to pass common data to all active contents
(conditions, embedded and included Perl) by a hash reference. By
convention, a translator at least passes the target language and user
settings by
activeBaseData => {
targetLanguage => "lang",
userSettings => \%userSettings,
},
User settings are intended to allow the specification of per
call settings by a user, e.g. to include special parts. By using this
convention, users can easily specify such a part the following way
? flagSet('setting')
Special part.
? 1
It is up to a translator author to declare translator specific
settings (and to document them). The passed values can be as complex as
necessary as long as they can be duplicated by
"Storable::dclone()".
Whenever active contents is invoked, the passed hash reference
is copied (duplicated by
"Storable::dclone()") into the Safe
objects namespace (see safe) as a global variable
$PerlPoint. This way, modifications by invoked
code do not effect subsequently called code snippets, base data are
always fresh.
- activeDataInit
- Reserved to pass hook functions to be called preparing every active
contents invokation. The hook is still unimplemented.
- cache
- This optional parameter controls source file paragraph caching.
By default, a source file is parsed completely everytime you
pass it to the parser. This is no problem with tiny sources but can
delay your work if you are dealing with large sources which have to be
translated periodically into presentations while they are written.
Typically most of the paragraphs remain unchanged from version to
version, but nevertheless everything is usually reparsed which means a
waste of time. Well, to improve this a paragraph cache can be activated
by setting this option to CACHE_ON.
The parser caches each initial source file
individually. That means if three files are passed to the parser with
activated caching, three cache files will be written. They are placed in
the source file directory, named .<source file>.ppcache. Please
note that the paragraphs of included sources are cached in the
cache file of the main document because they may have to be
evaluated differently depending on inclusion context.
What acceleration can be expected? Well, this strongly
depends on your source structure. Efficiency will grow with longer
paragraphs, reused paragraphs and paragraph number. It will be reduced
by heavy usage of active contents and embedding because every paragraph
that refers to parts defined externally is not strongly determined by
itself and therefore it cannot be cached. Here is a list of all reasons
which cause a paragraph to be excluded from caching:
- Embedded parts
- Obviously dynamic parts may change from one version to another, but even
static parts could have to be interpreted differently because a user can
set up new filters.
- Included files
- An \INCLUDE tag immediately disables caching for the paragraph it resides
in because the loaded file may change its contents. This is not really a
restriction because the included paragraphs themselves are cached
if possible.
- Filtered paragraphs
- A paragraph filter can transform a source paragraph in whatever the author
of a Perl function might think is useful, potentially depending on highly
dynamical data. So it cannot be determined by the parser what the final
translation of a certain source paragraph will be.
- Document stream entry points
- Depending on the parsers configuration, these points can be transformed
into headlines or remain unchanged, so there is no fixed up mapping
between a source paragraph and its streamed expression.
Even with these restrictions about 70% of a real life document of
more than 150 paragraphs could be cached. This saved more than 60% of
parsing time in subsequent translator calls.
New cache entries are always added which means that old
entries are never replaced and a cache file tends to grow. If you ever wish
to clean up a cache file completely pass CACHE_CLEANUP to this
option.
To deactivate caching explicitly pass CACHE_OFF. An
existing cache will not be destroyed.
Settings can be combined by addition.
# clean up the cache, then refill it
cache => CACHE_CLEANUP+CACHE_ON,
# clean up the cache and deactivate it
cache => CACHE_CLEANUP+CACHE_OFF,
The CACHE_OFF value is overwritten by any other
setting.
It is suggested to make this setting available to translator users
to let them decide if a cache should be used.
Please note that there is a problem with line numbers if
paragraphs are restored from cache because of the behaviour of perls
paragraph mode. In this mode, the <> operator reads in any number of
newlines between paragraphs but supplies only one of them. That is why I do
not get the real number of lines in a paragraph and therefore cannot store
them. To work around this, two strategies can be used. First, do not use
more than exactly one newline between paragraphs. (This strategy is not for
real life users, of course, but in this case restored numbers would be
correct.) Second, remember that source line numbers are only interesting in
error messages. If the parser detects an error, it therefore says: error
"there or later" when a cache hit already occured. If the real
number is wished the parser could be reinvoked then with deactivated cache
and will report it.
Another known paragraph mode problem occurs if you parse on
a UNIX system but your document (or parts of it) were written in DOS format.
The paragraph mode reads such a document completely. Please replace
the line ending character sequences system appropriate. (If you are using
"dos2unix" under Solaris please invoke it
with option "-ascii" to do this.)
More, Perls paragraph mode and PerlPoint treat whitespace lines
differently. Because of the way it works, paragraph mode does not recognize
them as "empty" while PerlPoint does for reasons of
usability (invisible characters should not make a difference). This means
that lines containing only whitespaces separate PerlPoint paragraphs but not
"Perl" paragraphs, making the cache working wrong especially in
examples. If paragraphs unintentionally disappear in the resulting
presentation, please check the "empty lines" before them.
Consistent cache data depend on the versions of the parser, of
constant declarations and of the module Storable which is used
internally. If the parser detects a significant change in one of these
versions, existing caches are automatically rebuilt.
Final cache note: cache files are not locked while they are
used. If you need this feature please let me know.
- criticalSemanticErrors
- If set to a true value, semantic errors will cause the parser to terminate
immediately. This defaults to false: errors are accumulated and finally
reported.
- display
- This parameter is optional. It controls the display of runtime messages
like informations or warnings. By default, all messages are displayed. You
can suppress these informations partially or completely by passing one or
more of the "DISPLAY_..." variables declared in
PerlPoint::Constants. Constants should be combined by
addition.
- docstreams2skip
- by default, all document streams are made part of the result, but by this
parameter one can exclude certain streams (all remaining ones will
be streamed as usual).
The list should be supplied by an array reference.
It is suggested to take the values of this parameter from a
user option, which by convention should be named
"-skipstream".
- docstreaming
- specifies the way the parser handles stream entry points. The value passed
might be either "DSTREAM_DEFAULT",
"DSTREAM_IGNORE" or
"DSTREAM_HEADLINES".
"DSTREAM_HEADLINES"
instructs the parser to transform the entry points into
headlines, one level below the current real headline level. This
is an easy to implement and convenient way of docstream handling seems
to make sense in most target formats.
"DSTREAM_IGNORE" hides all
streams except of the main stream. The effect is similar to a call with
docstreams2skip set for all document streams in a source.
"DSTREAM_DEFAULT" treats the
entry points as entry points and streams them as such. This is the
default if the parameter is omitted.
Please note that filters applied by docstream2skip work
regardless of the docstreaming configuration which only affects
the way the parser passes docstream data to a backend.
It is recommended to take the value of this parameter from a
user option, which by convention should be named
"-docstreaming". (A converter can
define various more modes than provided by the parser and implement them
itself, of course. See "pp2sdf" for a
reference implementation.)
- files
- a reference to an array of files to be scanned.
Files are treated as PerlPoint sources except when their name
has the prefix "IMPORT:", as in
"IMPORT:podsource.pod". With this
prefix, the parser tries to automatically tranform the source into
PerlPoint, using a standard import filter for the format indicated by
the file extension ("pod" in our
example). The filter must be installed as
"PerlPoint::Import::<uppercased format
name>", e.g.
"PerlPoint::Import::POD".
- filter
- a regular expression describing the target language. This setting, if
used, prevents all embedded or included source code of other languages
than the set one from inclusion into the generated stream. This
accelerates both parsing and backend handling. The pattern is evaluated
case insensitively.
Example: pass "html|perl" to allow HTML and Perl.
To illustrate this, imagine a translator to PostScript. If it
reads a Perl Point file which includes native HTML, this translator
cannot handle such code. The backend would have to skip the HTML
statements. With a "PostScript" filter, the HTML code will not
appear in the stream.
This enables PerlPoint texts prepared for various target
languages. If an author really needs plain target language code to be
embedded into PerlPoint, he could provide versions for various
languages. Translators using a filter will then receive exactly the code
of their target language, if provided.
Please note that you cannot filter out PerlPoint code or
example files.
By default, no filter is set.
- headlineLinks
- this optional flag causes the parser to register all headline titles as
anchors automatically. (Headlines are stored without possibly included
tags which are stripped off.)
Registering anchors does \not mean there are anchors
included to the stream, it just means that they are known to exist at
parsing time because they are added to an internal
"PerlPoint::Anchor" object which is
passed to all tag hooks and can be evaluated there. See
\"PerlPoint::Tags" and
"PerlPoint::Anchors" for details.
It is recommended to make use of this feature if your
converter automatically makes headlines an anchor named like the
headline (this feature was introduced by Lorenz Domkes
"pp2html" initially). (Nevertheless,
usefulness may depend on dealing with the parsers anchor collection in
tag hooks. See the documentations of used tag modules for details.)
If your converter does not support automatic headline anchors
the mentioned way, it is recommended to omit this option because it
could confuse tag hooks that evaluate the parsers anchor collection.
- libpath
- An optional reference to an array of library pathes to be searched for
files specified by \INCLUDE tags. This array is intended to be filled by
directories specified via an converter option. By convention, this option
is named "includelib" and should be
enabled multiple times ("converter -includelib path1
-includelib path2 document.pp").
Please note that library pathes can be set via environment
variable "PERLPOINTLIB" as well, but
directories specified via "libpath"
are searched first.
- linehints
- If set to a true value, the parser will embed line hints into the stream
whenever a new source line begins.
A line hint directive is provided as
[
DIRECTIVE_NEW_LINE, DIRECTIVE_START,
{file=>filename, line=>number}
]
and is suggested to be handled by a backend callback.
Please note that currently source line numbers are not
guaranteed to be correct if stream parts are restored from cache
(see there for details).
The default value is 0.
- nestedTables
- This is an optional flag which is by default set to 0, indicating if the
parser shall accept nested tables or not. Table nesting can produce very
nice results if it is supported by the target language. HTML, for example,
allows to nest tables, but other languages do not. So, using this
feature can really improve the results if a user is focussed on supporting
certain target formats only. If I want to produce nothing but HTML, why
should I take care of target formats not able to handle table nesting? On
the other hand, if a document shall be translated into several
formats, it might cause trouble to nest tables therein.
Because of this, it is suggested to let converter users decide
if they want to enable table nesting or not. If the target format does
not support nesting, I recommend to disable nesting completely.
- object
- the parser object made by new();
- safe
- an object of the Safe class which comes with perl. It is used to
evaluate embedded Perl code in a safe environment. By letting the caller
of run() provide this object, a translator
author can make the level of safety fully configurable by users. Usually,
the following should work
use Safe;
...
$parser->run(safe=>new Safe, ...);
Safe is a really good module but unfortunately limited in
loading modules transparently. So if a user wants to use modules in his
embedded code, he might fail to get it working in a Safe compartment. If
safety does not matter, he can decide to execute it without Safe, with
full Perl access. To switch on this mode, pass a true scalar value (but
no reference) instead of a Safe object.
To make all PerlPoint converters behave similarly, it is
recommended to provide two related options
"-activeContents" and
"-safeOpcode".
"-activeContents" should flag that
active contents shall be evaluated, while
"-safeOpcode" controls the level of
security. A special level "ALL" should
mean that all code can b executed without any restriction, while any
other settings should be treated as an opcode to configure the Safe
object. So, the recommended rules are: pass 0 unless
"-activeContents" is set. Pass 1 if
the converter was called with
"-activeContents" and
"-safeOpcode ALL". Pass a Safe object
and configure it according to the users
"-safeOpcode" settings if
"-activeContents" is used but without
"-safeOpcode ALL". See
"pp2sdf" for an implementation
example.
Active Perl contents is suppressed if this setting is
omitted or if anything else than a Safe object is passed. (There
are currently three types of active contents: embedded or included Perl
and condition paragraphs.)
- predeclaredVars
- Variables are usually set by assignment paragraphs. However, it may be
useful for a converter to predeclare a set of them to provide certain
settings to the users. Predeclared variables, as any other PerlPoint
variables, can be used both in pure PerlPoint and in active contents. To
help users distinguish them from user defined vars, their names will be
capitalized.
Just pass a hash of variable name / value pairs:
$parser->run(
...
predeclaredVars => {
CONVERTER_NAME => 'pp2xy',
CONVERTER_VERSION => $VERSION,
...
},
);
Non capitalized variable names will be capitalized without
further notice.
Please note that variables currently can only be scalars.
Different data types will not be accepted by the parser.
Predeclared variables should be mentioned in the converters
documentation.
The parser itself makes use of this feature by declaring
"_PARSER_VERSION" (the version of this
module used to parse the source) and _STARTDIR (the full path of the
startup directory, as reported by
"Cwd::cdw()").
"predeclaredVars" needs
"var2stream" to take effect.
- skipcomments
- By default comments are streamed and can be converted into comments of the
target language. But often they are of limited use in generated files:
especially if they are intended to help the author of a document, not the
reader of the source of generated results. So with this option one can
suppress comments from being streamed.
It is suggested to get this setting via user option, which by
convention should be named
"-skipcomments".
- stream
- A reference to an array where the generated output stream should be stored
in.
Application programmers may want to tie this array if the
target ASCII texts are expected to be large (long ASCII texts can result
in large stream data which may occupy a lot of memory). Because of the
fact that the parser stores stream data by paragraph, memory
consumption can be reduced significantly by tying the stream array.
It is recommended to pass an empty array. Stored data will not
be overwritten, the parser appends its data instead (by
"push()").
- trace
- This parameter is optional. It is intended to activate trace code while
the method runs. You may pass any of the "TRACE_..." constants
declared in PerlPoint::Constants, combined by addition as in the
following example:
# show the traces of both
# lexical and syntactical analysis
trace => TRACE_LEXER+TRACE_PARSER,
If you omit this parameter or pass TRACE_NOTHING, no traces
will be displayed.
- var2stream
- If set to a true value, the parser will propagate variable settings into
the stream by adding additional
"DIRECTIVE_VARSET" directives.
A variable propagation has the form
[
DIRECTIVE_VARSET, DIRECTIVE_START,
{var=>varname, value=>value}
]
and is suggested to be handled by a backend callback.
The default value is 0.
- vispro
- activates "process visualization" which simply means that a user
will see progress messages while the parser processes documents. The
numerical value of this setting determines how often the progress
message shall be updated, by a chapter interval:
# inform every five chapters
vispro => 5,
Process visualization is automatically suppressed unless
STDERR is connected to a terminal, if this option is omitted,
display was set to
"DISPLAY_NOINFO" or parser
traces are activated.
Return value: A "true" value in case of success,
"false" otherwise. A call is performed successfully if there was
neither a syntactical nor a semantic error in the parsed files.
Example:
$parser->run(
stream => \@streamData,
files => \@ARGV,
filter => 'HTML',
cache => CACHE_ON,
trace => TRACE_PARAGRAPHS,
);
A class method that supplied all anchors collected by the parser.
Example:
my $anchors=PerlPoint::Parser::anchors;
The following code shows a minimal but complete parser.
# pragmata
use strict;
# load modules
use PerlPoint::Parser;
# declare variables
my (@streamData);
# build parser
my ($parser)=new PerlPoint::Parser;
# and call it
$parser->run(
stream => \@streamData,
files => \@ARGV,
);
It is suggested to avoid operating in namespace main::. In order
to emulate the behaviour of the Safe module by
"eval()" in case a user wishes to get full
Perl access for active contents, active contents needs to be executed in this
namespace. Safe does not allow to change this, so the documented default for
"saved" and "not saved" active contents needs to be
"main::". This means that both the parser
and active contents will pollute "main::".
Prevent from being effected by choosing a different converter namespace. The
PerlPoint::Converter::
hyrarchy is reserved for this purpose. The recommended namespace is
"PerlPoint::Converter::<converter
name">, e.g.
"PerlPoint::Converter::pp2sdf".
The PerlPoint format was initially designed by Tom Christiansen, who
wrote an HTML slide generator for it, too.
Lorenz Domke added a number of additional, useful and
interesting features to the original implementation. At a certain point, we
decided to redesign the tool to make it a base for slide generation not only
into HTML but into various document description languages.
The PerlPoint format implemented by this parser version is
slightly different from the original design. Presentations written for Perl
Point 1.0 will not pass the parser but can simply be converted into
the new format. We designed the new format as a team of Lorenz Domke,
Stephen Riehm and me.
From version 0.24 on the Storable module is a prerequisite of the parser package
because Storable is used to store and retrieve cache data in files. If you
update your Storable installation it might happen that its internal
format changes and therefore stored cache data becomes unreadable. To avoid
this, the parser automatically rebuilds existing caches in case of Storable
updates.
If caches are used, the parser writes cache files where the initial
sources are stored. They are named .<source file>.ppcache.
- PerlPoint::Backend
- A frame class to write backends basing on the STREAM OUTPUT.
- PerlPoint::Constants
- Constants used by parser functions and in the STREAM FORMAT.
- PerlPoint::Tags
- Tag declaration base class.
- pp2sdf
- A reference implementation of a PerlPoint converter, distributed with the
parser package.
- pp2html
- The inital PerlPoint tool designed and provided by Tom Christiansen. A new
translator by Lorenz Domke using PerlPoint::Package.
A PerlPoint mailing list is set up to discuss usage, ideas, bugs, suggestions
and translator development. To subscribe, please send an empty message to
perlpoint-subscribe@perl.org.
If you prefer, you can contact me via perl@jochen-stenzel.de as
well.
Copyright (c) Jochen Stenzel (perl@jochen-stenzel.de), 1999-2001. All rights
reserved.
This module is free software, you can redistribute it and/or
modify it under the terms of the Artistic License distributed with Perl
version 5.003 or (at your option) any later version. Please refer to the
Artistic License that came with your Perl distribution for more details.
The Artistic License should have been included in your
distribution of Perl. It resides in the file named "Artistic" at
the top-level of the Perl source tree (where Perl was downloaded/unpacked -
ask your system administrator if you dont know where this is).
Alternatively, the current version of the Artistic License distributed with
Perl can be viewed on-line on the World-Wide Web (WWW) from the following
URL: http://www.perl.com/perl/misc/Artistic.html.
PerlPoint::Parser is built using Parse::Yapp a way
that users have not to explicitly install Parse::Yapp
themselves. According to the copyright note of Parse::Yapp I have to
mention the following:
"The Parse::Yapp module and its related modules and shell
scripts are copyright (c) 1998-1999 Francois Desarmenien, France. All rights
reserved.
You may use and distribute them under the terms of either the GNU
General Public License or the Artistic License, as specified in the Perl
README file."
This software is distributed in the hope that it will be useful, but is provided
"AS IS" WITHOUT WARRANTY OF ANY KIND, either expressed or implied,
INCLUDING, without limitation, the implied warranties of MERCHANTABILITY and
FITNESS FOR A PARTICULAR PURPOSE.
The ENTIRE RISK as to the quality and performance of the software
IS WITH YOU (the holder of the software). Should the software prove
defective, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
CORRECTION.
IN NO EVENT WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY
CREATE, MODIFY, OR DISTRIBUTE THE SOFTWARE BE LIABLE OR RESPONSIBLE TO YOU
OR TO ANY OTHER ENTITY FOR ANY KIND OF DAMAGES (no matter how awful - not
even if they arise from known or unknown flaws in the software).
Please refer to the Artistic License that came with your Perl
distribution for more details.
Hey! The above document had some coding errors, which are explained
below:
- Around line 1399:
- Non-ASCII character seen before =encoding in '"ü".'.
Assuming CP1252
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |