|
|
| |
Text::BibTeX(3) |
User Contributed Perl Documentation |
Text::BibTeX(3) |
Text::BibTeX - interface to read and parse BibTeX files
use Text::BibTeX;
my $bibfile = Text::BibTeX::File->new("foo.bib");
my $newfile = Text::BibTeX::File->new(">newfoo.bib");
while ($entry = Text::BibTeX::Entry->new($bibfile))
{
next unless $entry->parse_ok;
. # hack on $entry contents, using various
. # Text::BibTeX::Entry methods
.
$entry->write ($newfile);
}
The "Text::BibTeX" module serves mainly as a
high-level introduction to the
"Text::BibTeX" library, for both code and
documentation purposes. The code loads the two fundamental modules for
processing BibTeX files
("Text::BibTeX::File" and
"Text::BibTeX::Entry"), and this
documentation gives a broad overview of the whole library that isn't available
in the documentation for the individual modules that comprise it.
In addition, the "Text::BibTeX"
module provides a number of miscellaneous functions that are useful in
processing BibTeX data (especially the kind that comes from bibliographies
as defined by BibTeX 0.99, rather than generic database files). These
functions don't generally fit in the object-oriented class hierarchy centred
around the "Text::BibTeX::Entry" class,
mainly because they are specific to bibliographic data and operate on
generic strings (rather than being tied to a particular BibTeX entry). These
are also documented here, in "MISCELLANEOUS FUNCTIONS".
Note that every module described here begins with the
"Text::BibTeX" prefix. For brevity, I have
dropped this prefix from most class and module names in the rest of this
manual page (and in most of the other manual pages in the library).
The "Text::BibTeX" library includes a number
of modules, many of which provide classes. Usually, the relationship is simple
and obvious: a module provides a class of the same name---for instance, the
"Text::BibTeX::Entry" module provides the
"Text::BibTeX::Entry" class. There are a few
exceptions, though: most obviously, the
"Text::BibTeX" module doesn't provide any
classes itself, it merely loads two modules
("Text::BibTeX::Entry" and
"Text::BibTeX::File") that do. The other
exceptions are mentioned in the descriptions below, and discussed in detail in
the documentation for the respective modules.
The modules are presented roughly in order of increasing
specialization: the first three are essential for any program that processes
BibTeX data files, regardless of what kind of data they hold. The later
modules are specialized for use with bibliographic databases, and serve both
to emulate BibTeX 0.99's standard styles and to provide an example of how to
define a database structure through such specialized modules. Each module is
fully documented in its respective manual page.
- "Text::BibTeX"
- Loads the two fundamental modules
("Entry" and
"File"), and provides a number of
miscellaneous functions that don't fit anywhere in the class
hierarchy.
- "Text::BibTeX::File"
- Provides an object-oriented interface to BibTeX database files. In
addition to the obvious attributes of filename and filehandle, the
"file" abstraction manages properties such as the database
structure and options for it.
- "Text::BibTeX::Entry"
- Provides an object-oriented interface to BibTeX entries, which can be
parsed from "File" objects, arbitrary
filehandles, or strings. Manages all the properties of a single entry:
type, key, fields, and values. Also serves as the base class for the
structured entry classes (described in detail in
Text::BibTeX::Structure).
- "Text::BibTeX::Value"
- Provides an object-oriented interface to values and simple
values, high-level constructs that can be used to represent the
strings associated with each field in an entry. Normally, field values are
returned simply as Perl strings, with macros expanded and multiple strings
"pasted" together. If desired, you can instruct
"Text::BibTeX" to return
"Text::BibTeX::Value" objects, which
give you access to the original form of the data.
- "Text::BibTeX::Structure"
- Provides the "Structure" and
"StructuredEntry" classes, which serve
primarily as base classes for the two kinds of classes that define
database structures. Read this man page for a comprehensive description of
the mechanism for implementing Perl classes analogous to BibTeX
"style files".
- "Text::BibTeX::Bib"
- Provides the "BibStructure" and
"BibEntry" classes, which serve two
purposes: they fulfill the same role as the standard style files of BibTeX
0.99, and they give an example of how to write new database structures.
These ultimately derive from, respectively, the
"Structure" and
"StructuredEntry" classes provided by
the "Structure" module.
- "Text::BibTeX::BibSort"
- One of the "BibEntry" class's base
classes: handles the generation of sort keys for sorting prior to output
formatting.
- "Text::BibTeX::BibFormat"
- One of the "BibEntry" class's base
classes: handles the formatting of bibliographic data for output in a
markup language such as LaTeX.
- "Text::BibTeX::Name"
- A class used by the "Bib" structure and
specific to bibliographic data as defined by BibTeX itself: parses
individual author names into "first", "von",
"last", and "jr" parts.
- "Text::BibTeX::NameFormat"
- Also specific to bibliographic data: puts split-up names (as parsed by the
"Name" class) back together in a custom
way.
For a first time through the library, you'll probably want to
confine your reading to Text::BibTeX::File and Text::BibTeX::Entry. The
other modules will come in handy eventually, especially if you need to
emulate BibTeX in a fairly fine grained way (e.g. parsing names, generating
sort keys). But for the simple database hacks that are the bread and butter
of the "Text::BibTeX" library, the
"File" and
"Entry" classes are the bulk of what
you'll need. You may also find some of the material in this manual page
useful, namely "CONSTANT VALUES" and "UTILITY
FUNCTIONS".
The "Text::BibTeX" module has a number of
optional exports, most of them constant values described in "CONSTANT
VALUES" below. The default exports are a subset of these constant values
that are used particularly often, the "entry metatypes" (also
accessible via the export tag "metatypes").
Thus, the following two lines are equivalent:
use Text::BibTeX;
use Text::BibTeX qw(:metatypes);
Some of the various subroutines provided by the module are also
exportable. "bibloop",
"split_list",
"purify_string", and
"change_case" are all useful in everyday
processing of BibTeX data, but don't really fit anywhere in the class
hierarchy. They may be imported from
"Text::BibTeX" using the
"subs" export tag.
"check_class" and
"display_list" are also exportable, but
only by name; they are not included in any export tag. (These two mainly
exist for use by other modules in the library.) For instance, to use
"Text::BibTeX" and import the entry
metatype constants and the common subroutines:
use Text::BibTeX qw(:metatypes :subs);
Another group of subroutines exists for direct manipulation of the
macro table maintained by the underlying C library. These functions (see
"Macro table functions", below) allow you to define, delete, and
query the value of BibTeX macros (or "abbreviations"). They may be
imported en masse using the
"macrosubs" export tag:
use Text::BibTeX qw(:macrosubs);
The "Text::BibTeX" module makes a number of
constant values available. These correspond to the values of various
enumerated types in the underlying C library, btparse, and their
meanings are more fully explained in the btparse documentation.
Each group of constants is optionally exportable using an export
tag given in the descriptions below.
- Entry metatypes
- "BTE_UNKNOWN",
"BTE_REGULAR",
"BTE_COMMENT",
"BTE_PREAMBLE",
"BTE_MACRODEF". The
"metatype" method in the
"Entry" class always returns one of
these values. The latter three describe, respectively,
"comment",
"preamble", and
"string" entries;
"BTE_REGULAR" describes all other entry
types. "BTE_UNKNOWN" should never be
seen (it's mainly useful for C code that might have to detect half-baked
data structures). See also btparse. Export tag:
"metatypes".
- AST node types
- "BTAST_STRING",
"BTAST_MACRO",
"BTAST_NUMBER". Used to distinguish the
three kinds of simple values---strings, macros, and numbers. The
"SimpleValue" class'
"type" method always returns one of
these three values. See also Text::BibTeX::Value, btparse. Export tag:
"nodetypes".
- Name parts
- "BTN_FIRST",
"BTN_VON",
"BTN_LAST",
"BTN_JR",
"BTN_NONE". Used to specify the various
parts of a name after it has been split up. These are mainly useful when
using the "NameFormat" class. See also
bt_split_names and bt_format_names. Export tag:
"nameparts".
- Join methods
- "BTJ_MAYTIE",
"BTJ_SPACE",
"BTJ_FORCETIE",
"BTJ_NOTHING". Used to tell the
"NameFormat" class how to join adjacent
tokens together; see Text::BibTeX::NameFormat and bt_format_names. Export
tag: "joinmethods".
"Text::BibTeX" provides several functions that
operate outside of the normal class hierarchy. Of these, only
"bibloop" is likely to be of much use to you
in writing everyday BibTeX-hacking programs; the other two
("check_class" and
"display_list") are mainly provided for the
use of other modules in the library. They are documented here mainly for
completeness, but also because they might conceivably be useful in other
circumstances.
- bibloop (ACTION, FILES [, DEST])
- Loops over all entries in a set of BibTeX files, performing some
caller-supplied action on each entry. FILES should be a reference to the
list of filenames to process, and ACTION a reference to a subroutine that
will be called on each entry. DEST, if given, should be a
"Text::BibTeX::File" object (opened for
output) to which entries might be printed.
The subroutine referenced by ACTION is called with exactly one
argument: the "Text::BibTeX::Entry"
object representing the entry currently being processed. Information
about both the entry itself and the file where it originated is
available through this object; see Text::BibTeX::Entry. The ACTION
subroutine is only called if the entry was successfully parsed; any
syntax errors will result in a warning message being printed, and that
entry being skipped. Note that all successfully parsed entries
are passed to the ACTION subroutine, even
"preamble",
"string", and
"comment" entries. To skip these
pseudo-entries and only process "regular" entries, then your
action subroutine should look something like this:
sub action {
my $entry = shift;
return unless $entry->metatype == BTE_REGULAR;
# process $entry ...
}
If your action subroutine needs any more arguments, you can
just create a closure (anonymous subroutine) as a wrapper, and pass it
to "bibloop":
sub action {
my ($entry, $extra_stuff) = @_;
# ...
}
my $extra = ...;
Text::BibTeX::bibloop (sub { &action ($_[0], $extra) }, \@files);
If the ACTION subroutine returns a true value and DEST was
given, then the processed entry will be written to DEST.
- check_class (PACKAGE, DESCRIPTION, SUPERCLASS, METHODS)
- Ensures that a PACKAGE implements a class meeting certain requirements.
First, it inspects Perl's symbol tables to ensure that a package named
PACKAGE actually exists. Then, it ensures that the class named by PACKAGE
derives from SUPERCLASS (using the universal method
"isa"). This derivation might be through
multiple inheritance, or through several generations of a class hierarchy;
the only requirement is that SUPERCLASS is somewhere in PACKAGE's tree of
base classes. Finally, it checks that PACKAGE provides each method listed
in METHODS (a reference to a list of method names). This is done with the
universal method "can", so the methods
might actually come from one of PACKAGE's base classes.
DESCRIPTION should be a brief string describing the class that
was expected to be provided by PACKAGE. It is used for generating
warning messages if any of the class requirements are not met.
This is mainly used by the supervisory code in
"Text::BibTeX::Structure", to ensure
that user-supplied structure modules meet the rules required of
them.
- display_list (LIST, QUOTE)
- Converts a list of strings to the grammatical conventions of a human
language (currently, only English rules are supported). LIST must be a
reference to a list of strings. If this list is empty, the empty string is
returned. If it has one element, then just that element is returned. If it
has two elements, then they are joined with the string
" and " and the resulting string is
returned. Otherwise, the list has N elements for N >= 3;
elements 1..N-1 are joined with commas, and the final element is
tacked on with an intervening ", and ".
If QUOTE is true, then each string is encased in single quotes
before anything else is done.
This is used elsewhere in the library for two very distinct
purposes: for generating warning messages describing lists of fields
that should be present or are conflicting in an entry, and for
generating lists of author names in formatted bibliographies.
In addition to loading the "File" and
"Entry" modules,
"Text::BibTeX" loads the XSUB code which
bridges the Perl modules to the underlying C library, btparse. This
XSUB code provides a number of miscellaneous utility functions, most of which
are put into other packages in the
"Text::BibTeX" family for use by the
corresponding classes. (For instance, the XSUB code loaded by
"Text::BibTeX" provides a function
"Text::BibTeX::Entry::parse", which is
actually documented as the "parse" method of
the "Text::BibTeX::Entry" class---see
Text::BibTeX::Entry. However, for completeness this function---and all the
other functions that become available when you
"use
Text::BibTeX"---are at least mentioned here. The
only functions from this group that you're ever likely to use are described in
"Generic string-processing functions".
These just initialize and shutdown the underlying C library. Don't call either
one of them; the "Text::BibTeX"
startup/shutdown code takes care of it as appropriate. They're just mentioned
here for completeness.
- initialize ()
- cleanup ()
- split_list (STRING, DELIM [, FILENAME [, LINE [, DESCRIPTION [,
OPTS]]]])
- Splits a string on a fixed delimiter according to the BibTeX rules for
splitting up lists of names. With BibTeX, the delimiter is hard-coded as
"and"; here, you can supply any string.
Instances of DELIM in STRING are considered delimiters if they are at
brace-depth zero, surrounded by whitespace, and not at the beginning or
end of STRING; the comparison is case-insensitive. See bt_split_names for
full details of how splitting is done (it's not the same as Perl's
"split" function). OPTS is a hash ref of
the same binmode and normalization arguments as with, e.g.
Text::BibTeX::File->open(). split_list calls
isplit_list() internally but handles UTF-8 conversion and
normalization, if requested.
Returns the list of strings resulting from splitting STRING on
DELIM.
- isplit_list (STRING, DELIM [, FILENAME [, LINE [, DESCRIPTION]]])
- Splits a string on a fixed delimiter according to the BibTeX rules for
splitting up lists of names. With BibTeX, the delimiter is hard-coded as
"and"; here, you can supply any string.
Instances of DELIM in STRING are considered delimiters if they are at
brace-depth zero, surrounded by whitespace, and not at the beginning or
end of STRING; the comparison is case-insensitive. See bt_split_names for
full details of how splitting is done (it's not the same as Perl's
"split" function). This function returns
bytes. Use Text::BibTeX::split_list to specify the same binmode and
normalization arguments as with, e.g. Text::BibTeX::File->open()
Returns the list of strings resulting from splitting STRING on
DELIM.
- purify_string (STRING [, OPTIONS])
- "Purifies" STRING in the BibTeX way (usually for generation of
sort keys). See bt_misc for details; note that, unlike the C interface,
"purify_string" does not modify
STRING in-place. A purified copy of the input string is returned.
OPTIONS is currently unused.
- change_case (TRANSFORM, STRING [, OPTIONS])
- Transforms the case of STRING according to TRANSFORM (a single character,
one of 'u', 'l', or
't'). See bt_misc for details; again,
"change_case" differs from the C
interface in that STRING is not modified in-place---the input string is
copied, and the transformed copy is returned.
Although these functions are provided by the
"Text::BibTeX" module, they are actually in
the "Text::BibTeX::Entry" package. That's
because they are implemented in C, and thus loaded with the XSUB code that
"Text::BibTeX" loads; however, they are
actually methods in the
"Text::BibTeX::Entry" class. Thus, they are
documented as methods in Text::BibTeX::Entry.
- parse (ENTRY_STRUCT, FILENAME, FILEHANDLE)
- parse_s (ENTRY_STRUCT, TEXT)
These functions allow direct access to the macro table maintained by
btparse, the C library underlying
"Text::BibTeX". In the normal course of
events, macro definitions always accumulate, and are only defined as a result
of parsing a macro definition (@string) entry.
btparse never deletes old macro definitions for you, and doesn't have
any built-in default macros. If, for example, you wish to start fresh with new
macros for every file, use
"delete_all_macros". If you wish to
pre-define certain macros, use
"add_macro_text". (But note that the
"Bib" structure, as part of its mission to
emulate BibTeX 0.99, defines the standard "month name" macros for
you.)
See also bt_macros in the btparse documentation for a
description of the C interface to these functions.
- add_macro_text (MACRO, TEXT [, FILENAME [, LINE]])
- Defines a new macro, or redefines an old one. MACRO is the name of the
macro, and TEXT is the text it should expand to. FILENAME and LINE are
just used to generate any warnings about the macro definition. The only
such warning occurs when you redefine an old macro: its value is
overridden, and "add_macro_text()"
issues a warning saying so.
- delete_macro (MACRO)
- Deletes a macro from the macro table. If MACRO isn't defined, takes no
action.
- delete_all_macros ()
- Deletes all macros from the macro table, even the predefined month
names.
- macro_length (MACRO)
- Returns the length of a macro's expansion text. If the macro is undefined,
returns 0; no warning is issued.
- macro_text (MACRO [, FILENAME [, LINE]])
- Returns the expansion text of a macro. If the macro is not defined, issues
a warning and returns "undef". FILENAME
and LINE, if supplied, are used for generating this warning; they should
be supplied if you're looking up the macro as a result of finding it in a
file.
These are both private functions for the use of the
"Name" class, and therefore are put in the
"Text::BibTeX::Name" package. You should use
the interface provided by that class for parsing names in the BibTeX style.
- _split (NAME_STRUCT, NAME, FILENAME, LINE, NAME_NUM, KEEP_CSTRUCT)
- free (NAME_STRUCT)
These are private functions for the use of the
"NameFormat" class, and therefore are put in
the "Text::BibTeX::NameFormat" package. You
should use the interface provided by that class for formatting names in the
BibTeX style.
- create ([PARTS [, ABBREV_FIRST]])
- free (FORMAT_STRUCT)
- _set_text (FORMAT_STRUCT, PART, PRE_PART, POST_PART, PRE_TOKEN,
POST_TOKEN)
- _set_options (FORMAT_STRUCT, PART, ABBREV, JOIN_TOKENS, JOIN_PART)
- format_name (NAME_STRUCT, FORMAT_STRUCT)
"Text::BibTeX" inherits several limitations
from its base C library, btparse; see "BUGS AND LIMITATIONS"
in btparse for details. In addition,
"Text::BibTeX" will not work with a Perl
binary built using the "sfio" library. This
is because Perl's I/O abstraction layer does not extend to third-party C
libraries that use stdio, and btparse most certainly does use stdio.
btool_faq, Text::BibTeX::File, Text::BibTeX::Entry, Text::BibTeX::Value
Greg Ward <gward@python.net>
Copyright (c) 1997-2000 by Gregory P. Ward. All rights reserved. This file is
part of the Text::BibTeX library. This library is free software; you may
redistribute it and/or modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |