|
|
| |
bt_format_names(3) |
btparse |
bt_format_names(3) |
bt_format_names - formatting BibTeX names for consistent output
bt_name_format * bt_create_name_format (char * parts,
boolean abbrev_first);
void bt_free_name_format (bt_name_format * format);
void bt_set_format_text (bt_name_format * format,
bt_namepart part,
char * pre_part,
char * post_part,
char * pre_token,
char * post_token);
void bt_set_format_options (bt_name_format * format,
bt_namepart part,
boolean abbrev,
bt_joinmethod join_tokens,
bt_joinmethod join_part);
char * bt_format_name (bt_name * name, bt_name_format * format);
After splitting a name into its components parts (represented as a
"bt_name" structure), you often want to put
it back together again as a single string in a consistent way. btparse
provides a very flexible way to do this, generally in two stages: first, you
create a "name format" which describes how to put the tokens and
parts of any name back together, and then you apply the format to a particular
name.
The "name format" is encapsulated in a
"bt_name_format" structure, which is
created with "bt_create_name_format()".
This function includes some clever trickery that means you can usually get
away with calling it alone, and not need to do any customization of the
format. If you do need to customize the format, though,
"bt_set_format_text()" and
"bt_set_format_options()" provide that
capability.
The format controls the following:
- which name parts are printed, and in what order (e.g. "first von last
jr", or "von last jr first")
- the text that precedes and follows each part (e.g. if the first name
follows the last name, you probably want a comma before the `first' part:
"Smith, John" rather than "Smith John")
- the text that precedes and follows each token (e.g. if the first name is
abbreviated, you may want a period after each token: "J. R.
Smith" rather than "J R Smith")
- the method used to join the tokens of each part together
- the method used to join each part to the following part
All of these except the list of parts to format are kept in arrays
indexed by name part: for example, the structure has a field
char * post_token[BT_MAX_NAMEPARTS]
and "post_token[BTN_FIRST]"
("BTN_FIRST" is from the
"bt_namepart"
"enum") is the string to be added after
each token in the first name---for example,
"." if the first name is to be abbreviated
in the conventional way.
Yet another "enum",
"bt_joinmethod", describes the available
methods for joining tokens together. Note that there are two sets of
join methods in a name format: between tokens within a single part, and
between the tokens of two different parts. The first allows you, for
example, to change "J R Smith" (first name
abbreviated with no post-token text but tokens joined by a space) to
"JR Smith" (the same, but first-name
tokens jammed together). The second is mainly used to ensure that
"von" and "last" name-parts may be joined with a tie:
"de~Roche" rather than
"de Roche".
The token join methods are:
- BTJ_MAYTIE
- Insert a "discretionary tie" between tokens. That is, either a
space or a "tie" is inserted, depending on context. (A
"tie," otherwise known as unbreakable space, is currently
hard-coded as "~"---from TeX.)
The format is then applied to a particular name by
"bt_format_name()", which returns a
new string.
- BTJ_SPACE
- Always insert a space between tokens.
- BTJ_FORCETIE
- Always insert a "tie" ("~")
between tokens.
- BTJ_NOTHING
- Insert nothing between tokens---just jam them together.
Tokens are joined together, and thus the choice of whether to
insert a "discretionary tie" is made, at two places: within a part
and between two parts. Naturally, this only applies when
"BTJ_MAYTIE" was supplied as the
token-join method; "BTJ_SPACE" and
"BTJ_FORCETIE" always insert either a
space or tie, and "BTJ_NOTHING" always
adds nothing between tokens. Within a part, ties are added after a the first
token if it is less than three characters long, and before the last token.
Between parts, a tie is added only if the preceding part consisted of single
token that was less than three characters long. In all other cases, spaces
are inserted. (This implementation slavishly follows BibTeX.)
- bt_create_name_format()
-
bt_name_format * bt_create_name_format (char * parts,
boolean abbrev_first)
Creates a name format for a given set of parts, with
variations for the most common forms of customization---the order of
parts and whether to abbreviate the first name.
The "parts" parameter
specifies which parts to include in a formatted name, as well as the
order in which to format them. "parts"
must be a string of four or fewer characters, each of which denotes one
of the four name parts: for instance,
"vljf" means to format all four parts
in "von last jr first" order. No characters outside of the set
"fvlj" are allowed, and no characters
may be repeated. "abbrev_first"
controls whether the `first' part will be abbreviated (i.e., only the
first letter from each token will be printed).
In addition to simply setting the list of parts to format and
the "abbreviate" flag for the first name,
"bt_create_name_format()" initializes
the entire format structure so as to minimize the need for further
customizations:
- The "token join method"---what to insert between tokens of the
same part---is set to "BTJ_MAYTIE"
(discretionary tie) for all parts
- The "part join method"---what to insert after the final token of
a particular part, assuming there are more parts to come---is set to
"BTJ_SPACE" for the `first', `last', and
`jr' parts. If the `von' part is present and immediately precedes the
`last' part (which will almost always be the case),
"BTJ_MAYTIE" is used to join `von' to
`last'; otherwise, `von' also gets
"BTJ_SPACE" for the inter-part join
method.
- The abbreviation flag is set to "FALSE"
for the `von', `last', and `jr' parts; for `first', the abbreviation flag
is set to whatever you pass in as
"abbrev_first".
- Initially, all "surrounding text" (pre-part, post-part,
pre-token, and post-token) for all parts is set to the empty string. Then
a few tweaks are done, depending on the
"abbrev_first" flag and the order of
tokens. First, if "abbrev_first" is
"TRUE", the post-token text for first
name is set to "."---this changes
"J R Smith" to "J.
R. Smith", which is usually the desired form. (If you
don't want the periods, you'll have to set the post-token text
yourself with "bt_set_format_text()".)
Then, if `jr' is present and immediately after `last' (almost
always the case), the pre-part text for `jr' is set to
", ", and the inter-part join method
for `last' is set to "BTJ_NOTHING".
This changes "John Smith Jr" (where
the space following "Smith" comes from
formatting the last name with a
"BTJ_SPACE" inter-part join method) to
"John Smith, Jr" (where the
", " is now associated with
"Jr"---that way, if there is no `jr'
part, the ", " will not be
printed.)
Finally, if `first' is present and immediately follows either
`jr' or `last' (which will usually be the case in "last-name
first" formats), the same sort of trickery is applied: the pre-part
text for `first' is set to ", ", and
the part join method for the preceding part (either `jr' or `last') is
set to "BTJ_NOTHING".
While all these rules are rather complicated, they mean that you
are usually freed from having to do any customization of the name format.
Certainly this is the case if you only need
"fvlj" and
"vljf" part orders, only want to
abbreviate the first name, want periods after abbreviated tokens,
non-breaking spaces in the "right" places, and commas in the
conventional places.
If you want something out of the ordinary---for instance,
abbreviated tokens jammed together with no puncuation, or abbreviated last
names---you'll need to customize the name format a bit with
"bt_set_format_text()" and
"bt_set_format_options()".
- bt_free_name_format()
-
void bt_free_name_format (bt_name_format * format)
Frees a name format created by
"bt_create_name_format()".
- bt_set_format_text()
-
void bt_set_format_text (bt_name_format * format,
bt_namepart part,
char * pre_part,
char * post_part,
char * pre_token,
char * post_token)
Allows you to customize some or all of the surrounding text
for a single name part. Supply "NULL"
for any chunk of text that you don't want to change.
For instance, say you want a name format that will abbreviate
first names, but without any punctuation after the abbreviated tokens.
You could create and customize the format as follows:
format = bt_create_name_format ("fvlj", TRUE);
bt_set_format_text (format,
BTN_FIRST, /* name-part to customize */
NULL, NULL, /* pre- and post- part text */
NULL, ""); /* empty string for post-token */
Without the
"bt_set_format_text()" call,
"format" would result in names
formatted like "J. R. Smith". After
setting the post-token text for first names to
"", this name would become
"J R Smith".
- bt_set_format_options()
-
void bt_set_format_options (bt_name_format * format,
bt_namepart part,
boolean abbrev,
bt_joinmethod join_tokens,
bt_joinmethod join_part)
Allows further customization of a name format: you can set the
abbreviation flag and the two token-join methods. Alas, there is no
mechanism for leaving a value unchanged; you must set everything with
"bt_set_format_options()".
For example, let's say that just dropping periods from
abbreviated tokens in the first name isn't enough; you really
want to save space by jamming the abbreviated tokens together:
"JR Smith" rather than
"J R Smith" Assuming the two calls in
the above example have been done, the following will finish the job:
bt_set_format_options (format, BTN_FIRST,
TRUE, /* keep same value for abbrev flag */
BTJ_NOTHING, /* jam tokens together */
BTJ_SPACE); /* space after final token of part */
Note that we unfortunately had to know (and supply) the
current values for the abbreviation flag and post-part join method, even
though we were only setting the intra-part join method.
- bt_format_name()
-
char * bt_format_name (bt_name * name, bt_name_format * format)
Once a name format has been created and customized to your
heart's content, you can use it to format any number of names that have
been split with "bt_split_name" (see
bt_split_names). Simply pass the name structure and name format
structure, and a newly-allocated string containing the formatted name
will be returned to you. It is your responsibility to
"free()" this string.
Greg Ward <gward@python.net>
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |