"DocSet::Doc" - A Base Document Class
# e.g. a subclass would do
use DocSet::Doc::HTML2HTML ();
my $doc = DocSet::Doc::HTML2HTML->new(%args);
$doc->scan();
my $meta = $doc->meta();
my $toc = $doc->toc();
$doc->render();
# internal methods
$doc->src_read();
$doc->src_filter();
This super class implement core methods for scanning a single document of a
given format and rendering it into another format. It provides sub-classes
with hooks that can change the default behavior. Note that this class cannot
be used as it is, you have to subclass it and implement the required methods
listed later.
- new
- init
- scan
scan the document into a parsed tree and retrieve its meta and
toc data if possible.
- render
render the output document and write it to its final
destination.
- src_read
Fetches the source of the document. The source can be read
from different media, i.e. a file://, http://, relational DB or OCR :)
(but these are left for subclasses to implement :)
A subclass may implement a "source" filter. For
example if the source document is written in an extended POD the source
filter may convert it into a standard POD. If the source includes some
template directives these can be pre-processed as well.
The document's content is coming out of this class ready for
parsing and converting into other formats.
- meta
a simple set/get-able accessor to the meta
attribute.
- toc
a simple set/get-able accessor to the toc attribute
- transform_src_doc
my $doc_src_path = $self->transform_src_doc($path);
search for the source doc with path of
$path at the search paths defined by the
configuration file search_paths attribute (similar to the
@INC search in Perl) and if found resolve it to
a relative to "abs_doc_root" path and
return it. If not found return the
"undef" value.
These methods must be implemented by the sub-classes:
- retrieve_meta_data
- Retrieve and set the meta data that describes the input document into the
meta object attribute. Various documents may provide different meta
information. The only required meta field is title.
These methods can be implemented by the sub-classes:
- src_filter
- A subclass may want to preprocess the source document before it'll be
processed. This method is called after the source has been read. By
default nothing happens.
Stas Bekman <stas (at) stason.org>