NAME

PerlPoint::Parser - a PerlPoint Parser

VERSION

This manual describes version 0.451.

SYNOPSIS

  # load the module:
  use PerlPoint::Parser;

  # build the parser and run it
  # to get intermediate data in @stream
  my ($parser)=new PerlPoint::Parser;
  $parser->run(
               stream => \@stream,
               files  => \@files,
              );

DESCRIPTION

The PerlPoint format, initially designed by Tom Christiansen, is intended to provide a simple and portable way to generate slides without the need of a proprietary product. Slides can be prepared in a text editor of your choice, generated on any platform where you find perl, and presented by any browser which can render the chosen output format.

To sum it up, PerlPoint Software takes an ASCII text and transforms it into slides written in a certain document description language. This is, by tradition, usually HTML, but you may decide to use another format like XML, SGML, TeX or whatever you want.

Well, this sounds fine, but how to build a translator which transforms ASCII into the output format of your choice? Thats what PerlPoint::Parser is made for. It performs the first translation step by parsing ASCII and transforming it into an intermediate stream format, which can be processed by a subsequently called translator backend. By separating parsing and output generation we get the flexibility to write as many backends as necessary by using the same parser frontend for all translators.

PerlPoint::Parser supports the complete GRAMMAR with exception of certain tags. Tags are supported the most common way: the parser recognizes any tag which is declared by the author of a translator. This way the parser can be used for various flavours of the PerlPoint language without having to be modified. So, if there is a need of a certain new flag, it can quickly be added without any change to PerlPoint::Parser.

The following chapters describe the input format (GRAMMAR) and the generated stream format (STREAM FORMAT). Finally, the class methods are described to show you how to build a parser.

GRAMMAR

This chapter describes how a PerlPoint ASCII slide description has to be formatted to pass PerlPoint::Parser parsers.

Please note that the input format does not completely determine how the output will be designed. The final format depends on the backend which has to be called after the parser to transform its output into a certain document description language. The final appearance depends on the browsers behaviour.

Each PerlPoint document is made of paragraphs.

The paragraphs

All paragraphs start at the beginning of their first line. The first character or string in this line determines which paragraph is recognized.

A paragraph is completed by an empty line (which may contain whitespaces). Exceptions are described.

Carriage returns in paragraphs which are completed by an empty line are transformed into a whitespace.

Comments

start with "//" and reach until the end of the line.

Headlines

start with one or more "=" characters. The number of "=" characters represents the headline level.

  =First level headline

  ==Second level headline

  ===Multi
    line
   headline
  example

It is possible to declare a "short version" of the headline title by appending a "~" and plain strings to the headline like in

   =Very long headlines are expressive but may exceed the
    available space for example in HTML navigation bars or
    something like that ~ Long headlines

The "~" often stands for similarity, or represents the described object in encyclopedias or dictionaries. So one may think of this as "long title is (sometimes) similar to short title".

Lists

Points or unordered lists start with a "*" character.

  * This is a first point.

  * And, I forgot,
    there is something more to point out.

There are ordered lists as well, and they start with a hash sign ("#"):

  # First, check the number of this.

  # Second, don't forget the first.

The hash signs are intended to be replaced by numbers by a backend.

Because PerlPoint works on base of paragraphs, any paragraph different to an ordered list point closes an ordered list. If you wish the list to be continued use a double hash sign in case of the single one in the point that reopens the list.

  # Here the ordered list begins.

  ? $includeMore

  ## This is point 2 of the list that started before.

  # In subsequent points, the usual single hash sign
    works as expected again.

List continuation works list level specific (see below for level details). A list cannot be continued in another chapter. Using "##" in the first point of a new list takes no special effect: the list will begin as usual (with number 1).

Definition lists are a third list variant. Each item starts with the described phrase enclosed by a pair of colons, followed by the definition text:

  :first things: are usually described first,

  :others:       later then.

All lists can be nested. A new level is introduced by a special paragraph called "list indention" which starts with a ">". A list level can be terminated by a "list indention stop" paragraph starting with a "<" character. (These startup characters symbolize "level shifts".)

  * First level.

  * Still there.

>

  * A list point of the 2nd level.

<

  * Back on first level.

It is possible to shift more than one level by adding a number. There should be no whitespace between the level shift character and the level number.

  * First level.

>

  * Second level.

>

  * Third level.

<2

  * Back on first level.

Level shifts are accepted between list items only.

Please note that there is no need to shift levels back if a list is completed. Any non list paragraph will reset list indentation, as well as the end of the source.

Texts

are paragraphs like points but begin immediately without a startup character:

  This is a simple text.

  In this new text paragraph,
  we demonstrate the multiline feature.

Optionally, a text paragraph can be started with a special character as well, which is a dot:

  .This is a simple text with dot.

  .In this new text paragraph,
  we demonstrate the multiline feature.

This is intended to be used by generators which translate other formats into PerlPoint, to make sure the first character of a paragraph has no special meaning to the PerlPoint parser.

Blocks

are intended to contain examples or code with tag recognition. This means that the parser will discover embedded tags. On the other hand, it means that one may have to escape ">" characters embedded into tags. Blocks begin with an indentation and are completed by the next empty line.

  * Look at these examples:

      A block.

      \I<Another> block.
      Escape ">" in tags: \C<<\>>.

  Examples completed.

Subsequent blocks are joined together automatically: the intermediate empty lines which would usually complete a block are translated into real empty lines within the block. This makes it easier to integrate real code sequences as one block, regardless of the empty lines included. However, one may explicitly wish to separate subsequent blocks and can do so by delimiting them by a special control paragraph:

  * Separated subsequent blocks:

      The first block.

  -

      The second block.

Note that the control paragraph starts at the left margin.

Verbatim blocks

are similar to blocks in indentation but deactivate pattern recognition. That means the embedded text is not scanned for tags and empty lines and may therefore remain as it was in its original place, possibly a script.

These special blocks need a special syntax. They are implemented as here documents. Start with a here document clause flagging which string will close the "here document":

  <<EOC

    PerlPoint knows various
    tags like \B, \C and \I. # unrecognized tags

  EOC

Tables

are supported as well, they start with an @ sign which is followed by the column delimiter:

  @|
   column 1   |   column 2   |  column 3
    aaa       |    bbb       |   ccc
    uuu       |    vvvv      |   www

The first line is automatically marked as a "table headline". Most converters emphasize such headlines by bold formatting, so there is no need to insert \B tags into the document.

If a table row contains less columns than the table headline, the "missed" columns are automatically added. This is,

  @|
  A | B | C
  1
  1 |
  1 | 2
  1 | 2 |
  1 | 2 | 3

is streamed exactly like

  @|
  A | B | C
  1 |   |
  1 |   |
  1 | 2 |
  1 | 2 |
  1 | 2 | 3

to make backend handling easier. (Empty HTML table cells, for example, are rendered slightly obscure by certain browsers unless they are filled with invisible characters, so a converter to HTML can detect such cells because of normalization and handle them appropriately.)

Please note that normalization refers to the headline row. If another line contains more columns than the headline, normalization does not care. If the maximum column number is detected in another row, a warning is issued. (As a help for converter authors, the title and maximum column number are made part of a table tag as internal options "__titleColumns__" and "__maxColumns__".)

In all tables, leading and trailing whitespaces of a cell are automatically removed, so you can use as many of them as you want to improve the readability of your source. The following table is absolutely equivalent to the last example:

  @|
  A                |       B         |      C
  1                |                 |
   1               |                 |
    1              | 2               |
     1             |  2              |
      1            | 2               |      3

There is also a more sophisticated way to describe tables, see the tag section below.

Note: Although table paragraphs cannot be nested, tables declared by tag possibly can (and might be embedded into table paragraphs as well). To help converter authors handling nested tables, the opening table tag provides an internal option "__nestingLevel__".

Conditions

start with a "?" character. If active contents is enabled, the paragraph text is evaluated as Perl code. The (boolean) evaluation result then determines if subsequent PerlPoint is read and parsed. If the result is false, all subsequent paragraphs until the next condition are skipped.

Note that base data is made available by a global (package) hash reference $PerlPoint. See run() for details about how to set up these data.

Conditions can be used to maintain various language versions of a presentation in one source file:

  ? $PerlPoint->{targetLanguage} eq 'German'

Or you could enable parts of your document by date:

  ? time>$dateOfTalk

or by a special setting:

  ? flagSet('setting')

Please note that the condition code shares its variables with embedded and included code.

To make usage easier and to improve readability, condition code is evaluated with disabled warnings (the language variable in the example above may not even been set).

Converter authors might want to provide predefined variables such as "$language" in the example.

Note: If a document uses document streams, be careful in intermixing docstream entry points and conditions. A condition placed in a skipped document stream will not e evaluated. A document stream entry point placed in a source area hidden by a false condition will not be reconized.

Variable assignment paragraphs

Variables can be used in the text and will be automatically replaced by their string values (if declared).

  The next paragraph sets a variable.

  $var=var

  This variable is called $var.

All variables are made available to embedded and included Perl code as well as to conditions and can be accessed there as package variables of "main::" (or whatever package name the Safe object is set up to). Because a variable is already replaced by the parser if possible, you have to use the fully qualified name or to guard the variables "$" prefix character to do so:

  \EMBED{lang=perl}join(' ', $main::var, \$var)\END_EMBED

Variable modifications by embedded or included Perl do not affect the variables visible to the parser. (This is true for conditions as well.) This means that

  $var=10
  \EMBED{lang=perl}$main::var*=2;\END_EMBED

causes $var to be different on parser and code side - the parser will still use a value of 10, while embedded code works on with a value of 20.

Macro or alias definitions

Sometimes certain text parts are used more than once. It would be a relieve to have a shortcut instead of having to insert them again and again. The same is true for tag combinations a user may prefer to use. That's what aliases (or "macros") are designed for. They allow a presentation author to declare his own shortcuts and to use them like a tag. The parser will resolve such aliases, replace them by the defined replacement text and work on with this replacement.

An alias declaration starts with a "+" character followed immediately by the alias name (without backslash prefix), optionally followed immediately by an option default list in "{}", followed immediately by a colon. (No additional spaces here.)

All text after this colon up to the paragraph closing empty line is stored as the replacement text. So, whereever you will use the new macro, the parser will replace it by this text and reparse the result. This means that your macro text can contain any valid constructions like tags or other macros.

The replacement text may contain strings embedded into doubled underscores like "__this__". This is a special syntax to mark that the macro takes parameters of these names (e.g. "this"). If a macro is used and these parameters are set, their values will replace the mentioned placeholders. The special placeholder "__body__" is used to mark where the macro body is to place.

If a macro is used and defined options are unset, but there are defaults for them in the optional default list, these defaults will be used for the respective options.

Here are a few examples:

  +RED:\FONT{color=red}<__body__>

  +F:\FONT{color=__c__}<__body__>

  +COLORED{c=blue}:\FONT{color=__c__}<__body__>

  +IB:\B<\I<__body__>>

  This \IB<text> is \RED<colored>.

  Defaults: first, text in \COLORED{c=red}<Red>,
  now text in \COLORED<Blue>.

  +TEXT:Macros can be used to abbreviate longer
     texts as well as other tags
  or tag combinations.

  +HTML:\EMBED{lang=html}

  Tags can be \RED<\I<nested>> into macros.
  And \I<\F{c=blue}<vice versa>>.
  \IB<\RED<This>> is formatted by nested macros.
  \HTML This is <i>embedded HTML</i>\END_EMBED.

  Please note: \TEXT

If no parameter is defined in the macro definition, options will not be recognized. The same is true for the body part. Unless "__body__" is used in the macro definition, macro bodies will not be recognized. This means that with the definition

  +OPTIONLESS:\B<__body__>

the construction

  \OPTIONLESS{something=this}<more>

is evaluated as a usage of "\OPTIONLESS" without body, followed by the string "{something=here}". Likewise, the definition

  +BODYLESS:found __something__

causes

  \BODYLESS{something=this}<more>

to be recognized as a usage of "\BODYLESS" with option "something", followed by the string "<more">. So this will be resolved as "found this". Finally,

  +JUSTTHENAME:Text phrase.

enforces these constructions

  ... \JUSTTHENAME, ...
  ... \JUSTTHENAME{name=Name}, ...
  ... \JUSTTHENAME<text>, ...
  ... \JUSTTHENAME{name=Name}<text> ...

to be translated into

  ... Text phrase. ...
  ... Text phrase.{name=Name} ...
  ... Text phrase.<text>, ...
  ... Text phrase.{name=Name}<text> ...

The principle behind all this is to make macro usage easier and intuative: why think of options or a body or of special characters possibly treated as option/body part openers unless the macro makes use of an option or body?

An empty macro text undefines the macro (if it was already known).

  // undeclare the IB alias
  +IB:

An alias can be used like a tag.

Aliases named like a tag overwrite the tag (as long as they are defined).

Document stream entry points

A document stream is a "document in document" and best explained by example.

  Consider a document talking about
  two scripts and comparing them. A
  typical review of this type is
  structured this way: headline, notes
  about script 1, notes about script 2,
  new headline to discuss another aspect,
  notes about script 1, notes about
  script 2, and so on.

Everything said about item 1 is a document stream, everything about object 2 as well. and a third stream is implicitly built by all parts outside these two. In slide construction, each stream can have its own area, for example

  -------------------------------------
  |                                   |
  |            main stream            |
  |                                   |
  -------------------------------------
  |                 |                 |
  |  item 1 stream  |  item 2 stream  |
  |                 |                 |
  -------------------------------------

But to construct a layout like this, streams need to be distinguished, and that is what "stream entry points" are made for.

A stream entry point starts with a "~" character, followed by a string which is the name of the stream. This may be an internal name only, or converters may turn it into a document part as well. The "__ALL__" string is reserved for internal purposes. It is recommended to treat "__MAIN__" as reserved as well, although it has no special meaning yet.

Once an entry point was passed, all subsequent document parts belong to the declared stream, up to the next entry point or a headline which implicitly switches back to the "main stream".

The parser can be instructed to ignore certain streams, see run() for details. If this feature is used, please be careful in intermixing stream entry points and conditions. A condition placed in a skipped document stream will not be evaluated.

It is up to a converter how document streams are used. Certain converters may ignore them at all. As a convenient solution, the parser can be instructed to transform stream entry points into headlines (one level below the current real headline level). See run() for details.

What about special formatting?

Earlier versions of pp2html supported special format hints like the HTML expression ">" for the ">" character, or "ü" for "ü". PerlPoint::Parser does not support this directly because such hints are specific to the output format - if someone wants to translate into TeX, it might be curious for him to use HTML syntax in his ASCII text. Further more, such hints can be handled completely by a backend which finds them unchanged in the produced output stream.

The same is true for special headers and trailers. It is a backend task to add them if necessary. The parser does handle the input only.

STREAM FORMAT

It is suggested to use PerlPoint::Backend to evaluate the intermediate format. Nevertheless, here is the documentation of this format.

The generated stream is an array of tokens. Most of them are very simple, representing just their contents - words, spaces and so on. Example:

  "These three words."

could be streamed into

  "These three" + " "+ "words."

(This shows the principle. Actually this complete sentence would be replied as one token for reasons of effeciency.)

Note that the final dot is part of the last token. From a document description view, this should make no difference, its just a string containing special characters or not.

Well, besides this "main stream", there are formatting directives. They flag the beginning or completion of a certain logical entity - this means a whole document, a paragraph or a formatting like italicising. Almost every entity is embedded into a start and a completion directive - except of simple tokens.

In the current implementation, a directive is a reference to an array of mostly two fields: a directive constant showing which entity is related, and a start or completion hint which is a constant, too. The used constants are declared in PerlPoint::Constants. Directives can pass additional informations by additional fields. By now, the headline directives use this feature to show the headline level, as well as the tag ones to provide tag type information and the document ones to keep the name of the original document. Further more, ordered list points can request a fix number this way.

  # this example shows a tag directive
  ... [DIRECTIVE_TAG, DIRECTIVE_START, "I"]
  + "formatted" + " " + "strings"
  + [DIRECTIVE_TAG, DIRECTIVE_COMPLETE, "I"] ...

To recognize whether a token is a basic or a directive, the ref() function can be used. However, this handling should be done by PerlPoint::Backend transparently. The format may be subject to changes and is documented for information purposes only.

Original line numbers are no part of the stream but can be provided by embedded directives on request, see below for details.

This is the complete generator format. It is designed to be simple but powerful.

METHODS

new()

The constructor builds and prepares a new parser object.

Parameters:

The class name.

Return value: The new object in case of success.

Example:

  my ($parser)=new PerlPoint::Parser;

run()

This function starts the parser to process a number of specified files.

Parameters: All parameters except of the object parameter are named (pass them by hash).

activeBaseData

This optional parameter allows to pass common data to all active contents (conditions, embedded and included Perl) by a hash reference. By convention, a translator at least passes the target language and user settings by

  activeBaseData => {
                     targetLanguage => "lang",
                     userSettings   => \%userSettings,
                    },

User settings are intended to allow the specification of per call settings by a user, e.g. to include special parts. By using this convention, users can easily specify such a part the following way

  ? flagSet('setting')

  Special part.

  ? 1

It is up to a translator author to declare translator specific settings (and to document them). The passed values can be as complex as necessary as long as they can be duplicated by "Storable::dclone()".

Whenever active contents is invoked, the passed hash reference is copied (duplicated by "Storable::dclone()") into the Safe objects namespace (see safe) as a global variable $PerlPoint. This way, modifications by invoked code do not effect subsequently called code snippets, base data are always fresh.

activeDataInit

Reserved to pass hook functions to be called preparing every active contents invokation. The hook is still unimplemented.

cache

This optional parameter controls source file paragraph caching.

By default, a source file is parsed completely everytime you pass it to the parser. This is no problem with tiny sources but can delay your work if you are dealing with large sources which have to be translated periodically into presentations while they are written. Typically most of the paragraphs remain unchanged from version to version, but nevertheless everything is usually reparsed which means a waste of time. Well, to improve this a paragraph cache can be activated by setting this option to CACHE_ON.

The parser caches each initial source file individually. That means if three files are passed to the parser with activated caching, three cache files will be written. They are placed in the source file directory, named .<source file>.ppcache. Please note that the paragraphs of included sources are cached in the cache file of the main document because they may have to be evaluated differently depending on inclusion context.

What acceleration can be expected? Well, this strongly depends on your source structure. Efficiency will grow with longer paragraphs, reused paragraphs and paragraph number. It will be reduced by heavy usage of active contents and embedding because every paragraph that refers to parts defined externally is not strongly determined by itself and therefore it cannot be cached. Here is a list of all reasons which cause a paragraph to be excluded from caching:

Embedded parts: Obviously dynamic parts may change from one version to another, but even static parts could have to be interpreted differently because a user can set up new filters.
Included files: An \INCLUDE tag immediately disables caching for the paragraph it resides in because the loaded file may change its contents. This is not really a restriction because the included paragraphs themselves are cached if possible.
Filtered paragraphs: A paragraph filter can transform a source paragraph in whatever the author of a Perl function might think is useful, potentially depending on highly dynamical data. So it cannot be determined by the parser what the final translation of a certain source paragraph will be.
Document stream entry points: Depending on the parsers configuration, these points can be transformed into headlines or remain unchanged, so there is no fixed up mapping between a source paragraph and its streamed expression.

Even with these restrictions about 70% of a real life document of more than 150 paragraphs could be cached. This saved more than 60% of parsing time in subsequent translator calls.

New cache entries are always added which means that old entries are never replaced and a cache file tends to grow. If you ever wish to clean up a cache file completely pass CACHE_CLEANUP to this option.

To deactivate caching explicitly pass CACHE_OFF. An existing cache will not be destroyed.

Settings can be combined by addition.

  # clean up the cache, then refill it
  cache => CACHE_CLEANUP+CACHE_ON,

  # clean up the cache and deactivate it
  cache => CACHE_CLEANUP+CACHE_OFF,

The CACHE_OFF value is overwritten by any other setting.

It is suggested to make this setting available to translator users to let them decide if a cache should be used.

Please note that there is a problem with line numbers if paragraphs are restored from cache because of the behaviour of perls paragraph mode. In this mode, the <> operator reads in any number of newlines between paragraphs but supplies only one of them. That is why I do not get the real number of lines in a paragraph and therefore cannot store them. To work around this, two strategies can be used. First, do not use more than exactly one newline between paragraphs. (This strategy is not for real life users, of course, but in this case restored numbers would be correct.) Second, remember that source line numbers are only interesting in error messages. If the parser detects an error, it therefore says: error "there or later" when a cache hit already occured. If the real number is wished the parser could be reinvoked then with deactivated cache and will report it.

Another known paragraph mode problem occurs if you parse on a UNIX system but your document (or parts of it) were written in DOS format. The paragraph mode reads such a document completely. Please replace the line ending character sequences system appropriate. (If you are using "dos2unix" under Solaris please invoke it with option "-ascii" to do this.)

More, Perls paragraph mode and PerlPoint treat whitespace lines differently. Because of the way it works, paragraph mode does not recognize them as "empty" while PerlPoint does for reasons of usability (invisible characters should not make a difference). This means that lines containing only whitespaces separate PerlPoint paragraphs but not "Perl" paragraphs, making the cache working wrong especially in examples. If paragraphs unintentionally disappear in the resulting presentation, please check the "empty lines" before them.

Consistent cache data depend on the versions of the parser, of constant declarations and of the module Storable which is used internally. If the parser detects a significant change in one of these versions, existing caches are automatically rebuilt.

Final cache note: cache files are not locked while they are used. If you need this feature please let me know.

criticalSemanticErrors

If set to a true value, semantic errors will cause the parser to terminate immediately. This defaults to false: errors are accumulated and finally reported.

display

This parameter is optional. It controls the display of runtime messages like informations or warnings. By default, all messages are displayed. You can suppress these informations partially or completely by passing one or more of the "DISPLAY_..." variables declared in PerlPoint::Constants. Constants should be combined by addition.

docstreams2skip

by default, all document streams are made part of the result, but by this parameter one can exclude certain streams (all remaining ones will be streamed as usual).

The list should be supplied by an array reference.

It is suggested to take the values of this parameter from a user option, which by convention should be named "-skipstream".

docstreaming

specifies the way the parser handles stream entry points. The value passed might be either "DSTREAM_DEFAULT", "DSTREAM_IGNORE" or "DSTREAM_HEADLINES".

"DSTREAM_HEADLINES" instructs the parser to transform the entry points into headlines, one level below the current real headline level. This is an easy to implement and convenient way of docstream handling seems to make sense in most target formats.

"DSTREAM_IGNORE" hides all streams except of the main stream. The effect is similar to a call with docstreams2skip set for all document streams in a source.

"DSTREAM_DEFAULT" treats the entry points as entry points and streams them as such. This is the default if the parameter is omitted.

Please note that filters applied by docstream2skip work regardless of the docstreaming configuration which only affects the way the parser passes docstream data to a backend.

It is recommended to take the value of this parameter from a user option, which by convention should be named "-docstreaming". (A converter can define various more modes than provided by the parser and implement them itself, of course. See "pp2sdf" for a reference implementation.)

files

a reference to an array of files to be scanned.

Files are treated as PerlPoint sources except when their name has the prefix "IMPORT:", as in "IMPORT:podsource.pod". With this prefix, the parser tries to automatically tranform the source into PerlPoint, using a standard import filter for the format indicated by the file extension ("pod" in our example). The filter must be installed as "PerlPoint::Import::<uppercased format name>", e.g. "PerlPoint::Import::POD".

filter

a regular expression describing the target language. This setting, if used, prevents all embedded or included source code of other languages than the set one from inclusion into the generated stream. This accelerates both parsing and backend handling. The pattern is evaluated case insensitively.

 Example: pass "html|perl" to allow HTML and Perl.

To illustrate this, imagine a translator to PostScript. If it reads a Perl Point file which includes native HTML, this translator cannot handle such code. The backend would have to skip the HTML statements. With a "PostScript" filter, the HTML code will not appear in the stream.

This enables PerlPoint texts prepared for various target languages. If an author really needs plain target language code to be embedded into PerlPoint, he could provide versions for various languages. Translators using a filter will then receive exactly the code of their target language, if provided.

Please note that you cannot filter out PerlPoint code or example files.

By default, no filter is set.

headlineLinks

this optional flag causes the parser to register all headline titles as anchors automatically. (Headlines are stored without possibly included tags which are stripped off.)

Registering anchors does \not mean there are anchors included to the stream, it just means that they are known to exist at parsing time because they are added to an internal "PerlPoint::Anchor" object which is passed to all tag hooks and can be evaluated there. See \"PerlPoint::Tags" and "PerlPoint::Anchors" for details.

It is recommended to make use of this feature if your converter automatically makes headlines an anchor named like the headline (this feature was introduced by Lorenz Domkes "pp2html" initially). (Nevertheless, usefulness may depend on dealing with the parsers anchor collection in tag hooks. See the documentations of used tag modules for details.)

If your converter does not support automatic headline anchors the mentioned way, it is recommended to omit this option because it could confuse tag hooks that evaluate the parsers anchor collection.

libpath

An optional reference to an array of library pathes to be searched for files specified by \INCLUDE tags. This array is intended to be filled by directories specified via an converter option. By convention, this option is named "includelib" and should be enabled multiple times ("converter -includelib path1 -includelib path2 document.pp").

Please note that library pathes can be set via environment variable "PERLPOINTLIB" as well, but directories specified via "libpath" are searched first.

linehints

If set to a true value, the parser will embed line hints into the stream whenever a new source line begins.

A line hint directive is provided as

  [
   DIRECTIVE_NEW_LINE, DIRECTIVE_START,
   {file=>filename, line=>number}
  ]

and is suggested to be handled by a backend callback.

Please note that currently source line numbers are not guaranteed to be correct if stream parts are restored from cache (see there for details).

The default value is 0.

nestedTables

This is an optional flag which is by default set to 0, indicating if the parser shall accept nested tables or not. Table nesting can produce very nice results if it is supported by the target language. HTML, for example, allows to nest tables, but other languages do not. So, using this feature can really improve the results if a user is focussed on supporting certain target formats only. If I want to produce nothing but HTML, why should I take care of target formats not able to handle table nesting? On the other hand, if a document shall be translated into several formats, it might cause trouble to nest tables therein.

Because of this, it is suggested to let converter users decide if they want to enable table nesting or not. If the target format does not support nesting, I recommend to disable nesting completely.

object

the parser object made by new();

safe

an object of the Safe class which comes with perl. It is used to evaluate embedded Perl code in a safe environment. By letting the caller of run() provide this object, a translator author can make the level of safety fully configurable by users. Usually, the following should work

  use Safe;
  ...
  $parser->run(safe=>new Safe, ...);

Safe is a really good module but unfortunately limited in loading modules transparently. So if a user wants to use modules in his embedded code, he might fail to get it working in a Safe compartment. If safety does not matter, he can decide to execute it without Safe, with full Perl access. To switch on this mode, pass a true scalar value (but no reference) instead of a Safe object.

To make all PerlPoint converters behave similarly, it is recommended to provide two related options "-activeContents" and "-safeOpcode". "-activeContents" should flag that active contents shall be evaluated, while "-safeOpcode" controls the level of security. A special level "ALL" should mean that all code can b executed without any restriction, while any other settings should be treated as an opcode to configure the Safe object. So, the recommended rules are: pass 0 unless "-activeContents" is set. Pass 1 if the converter was called with "-activeContents" and "-safeOpcode ALL". Pass a Safe object and configure it according to the users "-safeOpcode" settings if "-activeContents" is used but without "-safeOpcode ALL". See "pp2sdf" for an implementation example.

Active Perl contents is suppressed if this setting is omitted or if anything else than a Safe object is passed. (There are currently three types of active contents: embedded or included Perl and condition paragraphs.)

predeclaredVars

Variables are usually set by assignment paragraphs. However, it may be useful for a converter to predeclare a set of them to provide certain settings to the users. Predeclared variables, as any other PerlPoint variables, can be used both in pure PerlPoint and in active contents. To help users distinguish them from user defined vars, their names will be capitalized.

Just pass a hash of variable name / value pairs:

  $parser->run(
               ...
               predeclaredVars => {
                                   CONVERTER_NAME    => 'pp2xy',
                                   CONVERTER_VERSION => $VERSION,
                                   ...
                                  },
              );

Non capitalized variable names will be capitalized without further notice.

Please note that variables currently can only be scalars. Different data types will not be accepted by the parser.

Predeclared variables should be mentioned in the converters documentation.

The parser itself makes use of this feature by declaring "_PARSER_VERSION" (the version of this module used to parse the source) and _STARTDIR (the full path of the startup directory, as reported by "Cwd::cdw()").

"predeclaredVars" needs "var2stream" to take effect.

skipcomments

By default comments are streamed and can be converted into comments of the target language. But often they are of limited use in generated files: especially if they are intended to help the author of a document, not the reader of the source of generated results. So with this option one can suppress comments from being streamed.

It is suggested to get this setting via user option, which by convention should be named "-skipcomments".

stream

A reference to an array where the generated output stream should be stored in.

Application programmers may want to tie this array if the target ASCII texts are expected to be large (long ASCII texts can result in large stream data which may occupy a lot of memory). Because of the fact that the parser stores stream data by paragraph, memory consumption can be reduced significantly by tying the stream array.

It is recommended to pass an empty array. Stored data will not be overwritten, the parser appends its data instead (by "push()").

trace

This parameter is optional. It is intended to activate trace code while the method runs. You may pass any of the "TRACE_..." constants declared in PerlPoint::Constants, combined by addition as in the following example:

  # show the traces of both
  # lexical and syntactical analysis
  trace => TRACE_LEXER+TRACE_PARSER,

If you omit this parameter or pass TRACE_NOTHING, no traces will be displayed.

var2stream

If set to a true value, the parser will propagate variable settings into the stream by adding additional "DIRECTIVE_VARSET" directives.

A variable propagation has the form

  [
   DIRECTIVE_VARSET, DIRECTIVE_START,
   {var=>varname, value=>value}
  ]

and is suggested to be handled by a backend callback.

The default value is 0.

vispro

activates "process visualization" which simply means that a user will see progress messages while the parser processes documents. The numerical value of this setting determines how often the progress message shall be updated, by a chapter interval:

  # inform every five chapters
  vispro => 5,

Process visualization is automatically suppressed unless STDERR is connected to a terminal, if this option is omitted, display was set to "DISPLAY_NOINFO" or parser traces are activated.

Return value: A "true" value in case of success, "false" otherwise. A call is performed successfully if there was neither a syntactical nor a semantic error in the parsed files.

Example:

  $parser->run(
               stream => \@streamData,
               files  => \@ARGV,
               filter => 'HTML',
               cache  => CACHE_ON,
               trace  => TRACE_PARAGRAPHS,
              );

anchors()

A class method that supplied all anchors collected by the parser.

Example:

   my $anchors=PerlPoint::Parser::anchors;

EXAMPLE

The following code shows a minimal but complete parser.

  # pragmata
  use strict;

  # load modules
  use PerlPoint::Parser;

  # declare variables
  my (@streamData);

  # build parser
  my ($parser)=new PerlPoint::Parser;
  # and call it
  $parser->run(
               stream  => \@streamData,
               files   => \@ARGV,
              );

NOTES

Converter namespace

It is suggested to avoid operating in namespace main::. In order to emulate the behaviour of the Safe module by "eval()" in case a user wishes to get full Perl access for active contents, active contents needs to be executed in this namespace. Safe does not allow to change this, so the documented default for "saved" and "not saved" active contents needs to be "main::". This means that both the parser and active contents will pollute "main::". Prevent from being effected by choosing a different converter namespace. The PerlPoint::Converter:: hyrarchy is reserved for this purpose. The recommended namespace is "PerlPoint::Converter::<converter name">, e.g. "PerlPoint::Converter::pp2sdf".

Format

The PerlPoint format was initially designed by Tom Christiansen, who wrote an HTML slide generator for it, too.

Lorenz Domke added a number of additional, useful and interesting features to the original implementation. At a certain point, we decided to redesign the tool to make it a base for slide generation not only into HTML but into various document description languages.

The PerlPoint format implemented by this parser version is slightly different from the original design. Presentations written for Perl Point 1.0 will not pass the parser but can simply be converted into the new format. We designed the new format as a team of Lorenz Domke, Stephen Riehm and me.

Storable updates

From version 0.24 on the Storable module is a prerequisite of the parser package because Storable is used to store and retrieve cache data in files. If you update your Storable installation it might happen that its internal format changes and therefore stored cache data becomes unreadable. To avoid this, the parser automatically rebuilds existing caches in case of Storable updates.

FILES

If caches are used, the parser writes cache files where the initial sources are stored. They are named .<source file>.ppcache.

SUPPORT

A PerlPoint mailing list is set up to discuss usage, ideas, bugs, suggestions and translator development. To subscribe, please send an empty message to perlpoint-subscribe@perl.org.

If you prefer, you can contact me via perl@jochen-stenzel.de as well.

AUTHOR

This module is free software, you can redistribute it and/or modify it under the terms of the Artistic License distributed with Perl version 5.003 or (at your option) any later version. Please refer to the Artistic License that came with your Perl distribution for more details.

The Artistic License should have been included in your distribution of Perl. It resides in the file named "Artistic" at the top-level of the Perl source tree (where Perl was downloaded/unpacked - ask your system administrator if you dont know where this is). Alternatively, the current version of the Artistic License distributed with Perl can be viewed on-line on the World-Wide Web (WWW) from the following URL: http://www.perl.com/perl/misc/Artistic.html.

PerlPoint::Parser is built using Parse::Yapp a way that users have not to explicitly install Parse::Yapp themselves. According to the copyright note of Parse::Yapp I have to mention the following:

You may use and distribute them under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file."

DISCLAIMER

This software is distributed in the hope that it will be useful, but is provided "AS IS" WITHOUT WARRANTY OF ANY KIND, either expressed or implied, INCLUDING, without limitation, the implied warranties of MERCHANTABILITY and FITNESS FOR A PARTICULAR PURPOSE.

The ENTIRE RISK as to the quality and performance of the software IS WITH YOU (the holder of the software). Should the software prove defective, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

IN NO EVENT WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY CREATE, MODIFY, OR DISTRIBUTE THE SOFTWARE BE LIABLE OR RESPONSIBLE TO YOU OR TO ANY OTHER ENTITY FOR ANY KIND OF DAMAGES (no matter how awful - not even if they arise from known or unknown flaws in the software).

Please refer to the Artistic License that came with your Perl distribution for more details.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 1399:: Non-ASCII character seen before =encoding in '"ü".'. Assuming CP1252