|
NAMEbibclean - prettyprint and syntax check BibTeX and Scribe bibliography data base filesSYNOPSISbibclean [ -author ] [ -copyleft ] [ -copyright ] [ -error-log filename ] [ -help ] [ '-?' ] [ -init-file filename ] [ -ISBN-file filename ] [ -keyword-file filename ] [ -max-width nnn ] [ -[no-]align-equals ] [ -[no-]brace-protect ] [ -[no-]check-values ] [ -[no-]debug-match-failures ] [ -[no-]delete-empty-values ] [ -[no-]file-position ] [ -[no-]fix-accents ] [ -[no-]fix-braces ] [ -[no-]fix-degrees ] [ -[no-]fix-font-changes ] [ -[no-]fix-initials ] [ -[no-]fix-math ] [ -[no-]fix-names ] [ -[no-]German-style ] [ -[no-]keep-linebreaks ] [ -[no-]keep-parbreaks ] [ -[no-]keep-preamble-spaces ] [ -[no-]keep-spaces ] [ -[no-]keep-string-spaces ] [ -[no-]parbreaks ] [ -[no-]prettyprint ] [ -[no-]print-ISBN-table ] [ -[no-]print-keyword-table ] [ -[no-]print-patterns ] [ -[no-]quiet ] [ -[no-]read-init-files ] [ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ] [ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -output-file filename ] [ -version ] <infile or bibfile1 bibfile2 bibfile3 ... >outfileAll options can be abbreviated to a unique leading prefix. An explicit file name of ``-'' represents standard input; it is assumed if no input files are specified. On VAX VMS and IBM PC DOS, the leading ``-'' on option names may be replaced by a slash, ``/''; however, the ``-'' option prefix is always recognized. DESCRIPTIONbibclean prettyprints input BibTeX files to stdout, or to a user-specified file, and checks the brace balance and bibliography entry syntax as well. It can be used to detect problems in BibTeX files that sometimes confuse even BibTeX itself, and importantly, can be used to normalize the appearance of collections of BibTeX files.Here is a summary of the formatting actions:
The standardized format of the output of bibclean facilitates the later application of simple filters, such as bibcheck(1), bibdup(1), bibextract(1), bibindex(1), bibjoin(1), biblabel(1), biblook(1), biborder(1), bibsort(1), citefind(1), and citetags(1), to process the text, and also is the one expected by the GNU Emacs BibTeX support functions. OPTIONSCommand-line switches may be abbreviated to a unique leading prefix, and letter case is not significant. All options are parsed before any input bibliography files are read, no matter what their order on the command line. Options that correspond to a yes/no setting of a flag have a form with a prefix "no-" to set the flag to no. For such options, the last setting determines the flag value used. That is significant when options are also specified in initialization files (see the INITIALIZATION FILES manual section).The leading hyphen that distinguishes an option from a filename may be doubled, for compatibility with GNU and POSIX conventions. Thus, -author and --author are equivalent. To avoid confusion with options, if a filename begins with a hyphen, it must be disguised by a leading absolute or relative directory path, e.g., /tmp/-foo.bib or ./-foo.bib.
ERROR RECOVERY AND WARNINGSWhen bibclean detects an error, it issues an error message to both stderr and stdout. That way, the user is clearly notified, and the output bibliography also contains the message at the point of error.Error messages begin with a distinctive pair of queries, ??, beginning in column 1, followed by the input file name and line number. If the -file-position option was specified, they also contain the input and output positions of the current file, entry, and value. Each position includes the file byte number, the line number, and the column number. In the event of a runaway string argument, the entry and value positions should precisely pinpoint the erroneous bibliography entry, and the file positions indicate where it was detected, which may be rather later in the files. Warning messages identify possible problems, and are therefore sent only to stderr, and not to stdout, so they never appear in the output file. They are identified by a distinctive pair of percents, %%, beginning in column 1, and as with error messages, may be followed by file position messages if the -file-position option was specified. For convenience, the first line of each error and warning message sent to stderr is formatted according to the expectations of the GNU Emacs next-error command. You can invoke bibclean with the Emacs M-x compile<RET>bibclean filename.bib >filename.new command, then use the next-error command, normally bound to C-x ` (that's a grave, or back, accent), to move to the location of the error in the input file. If error messages are ignored, and left in the output bibliography file, they precipitates an error when the bibliography is next processed with BibTeX. After issuing an error message, bibclean then resynchronizes its input by copying it verbatim to stdout until a new bibliography entry is recognized on a line in which the first non-blank character is an at-sign (@). That ensures that nothing is lost from the input file(s), allowing corrections to be made in either the input or the output files. However, if bibclean detects an internal error in its data structures, it terminates abruptly without further input or output processing; that kind of error should never happen, and if it does, it should be reported immediately to the author of the program. Errors in initialization files, and running out of dynamic memory, also immediately terminate bibclean. SEARCH PATHSVersions of bibclean before 3.00 found some of their initialization files in the same directory as the executable program. That design choice means that those files can be copied anywhere in the file system, and still be found at run time. Some software distributions, however, prefer to follow the model where initialization and other related files are instead stored in a directory whose name is related to that of the executable by a conventional difference in filepath. For example, a program might be installed in /opt/bin and its associated files in /opt/share/lib/PROGRAMNAME/ or /opt/share/lib/PROGRAMNAME/PROGRAMVERSION/. The second form is preferable, because it permits multiple versions of the same program to be installed, as long as the executable program names carry a version suffix. Thus, a site might have installed programs named bibclean-1.00, bibclean-2.00, bibclean-2.15, and bibclean-3.00, with the versionless name bibclean being a symbolic link to whichever version is the desired local default.With most software packages, the absolute path to the directory containing associated files is compiled into the program, making it impossible to change the installation locations after the program has been built from source code. Some packages, however, instead use the location of the executable program to find files by relative path at runtime. In the above example, the program would determine its filesystem location at runtime, say /opt/bin, then find its associated files relative to that location in ../share/lib/PROGRAMNAME/PROGRAMVERSION/. From version 3.00, bibclean uses that second approach, with an associated directory like ../share/lib/bibclean/3.00. That allows an installation directory tree to be distributed to other systems and unbundled anywhere in the file system, as long as the relative paths are not changed. bibclean tests whether its compiled-in library path is a directory on the local system, and if so, uses it. Otherwise, it replaces that path by a reconstructed one based on the location of the executable program. If the reconstructed path for the library directory does not exist, it uses a warning. In either case, it continues normally. With the old approach, initialization files on Unix systems were named with a leading period, making them `hidden' files for the ls command. With the new practice, initialization files are no longer named as hidden files. INITIALIZATION FILESbibclean can be compiled with one of three different types of pattern matching; the choice is made by the installer at compile time:
The second and third versions are the ones of most interest here, because they allow the user to control what values are considered acceptable. However, command-line options can also be specified in initialization files, no matter which pattern matching choice was selected. When bibclean starts, it searches for initialization files, finding the first one in the system executable program search path (on UNIX and IBM PC DOS, PATH) and the first one in the BIBINPUTS search path, and processes them in turn. Then, when command-line arguments are processed, any additional files specified by -init-file filename options are also processed. Finally, immediately before each named bibliography file is processed, an attempt is made to process an initialization file with the same name, but with the extension changed to .ini. The default extension can be changed by a setting of the environment variable BIBCLEANEXT. That scheme permits system-wide, user-wide, session-wide, and file-specific initialization files to be supported. When input is taken from stdin, there is no file-specific initialization. For precise control, the -no-read-init-files option suppresses all initialization files except those explicitly named by -init-file filename options, either on the command line, or in requested initialization files. Recursive execution of initialization files with nested -init-file options is permitted; if the recursion is circular, bibclean finally gets a non-fatal initialization file open failure after opening too many files. That terminates further initialization file processing. As the recursion unwinds, the files are all closed, then execution proceeds normally. An initialization file may contain empty lines, comments from percent to end of line (just like TeX), option switches, and field/pattern or field/pattern/message assignments. Leading and trailing spaces are ignored. That is best illustrated by a short example: % This is a small bibclean initialization file -init-file /u/math/bib/.bibcleanrc %% departmental patterns chapter = "\"D\"" %% 23 pages = "\"D--D\"" %% 23--27 volume = "\"D \\an\\d D\"" %% 11 and 12 year = \ "\"dddd, dddd, dddd\"" \ "Multiple years specified." %% 1989, 1990, 1991 -no-fix-names %% do not modify author/editor lists Long logical lines can be split into multiple physical lines by breaking at a backslash-newline pair; the backslash-newline pair is discarded. That processing happens while characters are being read, before any further interpretation of the input stream. Each logical line must contain a complete option (and its value, if any), or a complete field/pattern pair, or a field/pattern/message triple. Comments are stripped during the parsing of the field, pattern, and message values. The comment start symbol is not recognized inside quoted strings, so it can be freely used in such strings. Comments on logical lines that were input as multiple physical lines via the backslash-newline convention must appear on the last physical line; otherwise, the remaining physical lines become part of the comment. Pattern strings must be enclosed in quotation marks; within such strings, a backslash starts an escape mechanism that is commonly used in UNIX software. The recognized escape sequences are:
Backslash followed by any other character produces just that character. Thus, \% gets a literal percent into a string (preventing its interpretation as a comment), \" produces a quotation mark, and \\ produces a single backslash. An ASCII NUL (\0) in a string terminates it; that is a feature of the C programming language in which bibclean is implemented. Field/pattern pairs can be separated by arbitrary space, and optionally, either an equals sign or colon functioning as an assignment operator. Thus, the following are equivalent: pages="\"D--D\"" pages:"\"D--D\"" pages "\"D--D\"" pages = "\"D--D\"" pages : "\"D--D\"" pages "\"D--D\"" Each field name can have an arbitrary number of patterns associated with it; however, they must be specified in separate field/pattern assignments. An empty pattern string causes previously-loaded patterns for that field name to be forgotten. That feature permits an initialization file to completely discard patterns from earlier initialization files. Patterns for value strings are represented in a tiny special-purpose language that is both convenient and suitable for bibliography value-string syntax checking. While not as powerful as the language of regular-expression patterns, its parsing can be portably implemented in less than 3% of the code in a widely-used regular-expression parser (the GNU regexp package). The patterns are represented by the following special characters:
The X pattern character is very powerful, but generally inadvisable, because it matches almost anything likely to be found in a BibTeX value string. The reason for providing pattern matching on the value strings is to uncover possible errors, not mask them. There is no provision for specifying ranges or repetitions of characters, but that can usually be done with separate patterns. It is a good idea to accompany the pattern with a comment showing the kind of thing it is expected to match. Here is a portion of an initialization file giving a few of the patterns used to match number value strings: number = "\"D\"" %% 23 number = "\"A AD\"" %% PN LPS5001 number = "\"A D(D)\"" %% RJ 34(49) number = "\"A D\"" %% XNSS 288811 number = "\"A D\\.D\"" %% Version 3.20 number = "\"A-A-D-D\"" %% UMIAC-TR-89-11 number = "\"A-A-D\"" %% CS-TR-2189 number = "\"A-A-D\\.D\"" %% CS-TR-21.7 For a bibliography that contains only article entries, that list should probably be reduced to just the first pattern, so that anything other than a digit string fails the pattern-match test. That is easily done by keeping bibliography-specific patterns in a corresponding file with extension .ini, because that file is read automatically. You should be sure to use empty pattern strings in the pattern file to discard patterns from earlier initialization files. The value strings passed to the pattern matcher contain surrounding quotes, so the patterns should also. However, you could use a pattern specification like "\"D" to match an initial digit string followed by anything else; the omission of the final quotation mark \" in the pattern allows the match to succeed without checking that the next character in the value string is a quotation mark. Because the value strings are intended to be processed by TeX, the pattern matching ignores braces, and TeX control sequences, together with any space following those control sequences. Spaces around braces are preserved. That convention allows the pattern fragment A-AD-D to match the value string TN-K\slash 27-70, because the value is implicitly collapsed to TN-K27-70 during the matching operation. bibclean's normal action when a string value fails to match any of the corresponding patterns is to issue a warning message something like this: "Unexpected value in ``year = "192"''. In most cases, that is sufficient to alert the user to a problem. In some cases, however, it may be desirable to associate a different message with a particular pattern. That can be done by supplying a message string following the pattern string. Format items %% (single percent), %e (entry name), %f (field name), %k (citation key), and %v (string value) are available to get current values expanded in the messages. Here is an example: chapter = "\"D:D\"" "Colon found in ``%f = %v''" %% 23:2 To be consistent with other messages output by bibclean, the message string should not end with punctuation. If you wish to make the message an error, rather than just a warning, begin it with a query (?), like this: chapter = "\"D:D\"" "?Colon found in ``%f = %v''" %% 23:2 The query is be included in the output message. Escape sequences are supported in message strings, just as they are in pattern strings. You can use that to advantage for fancy things, such as terminal display mode control. If you rewrite the previous example as chapter = "\"D:D\"" \ "?\033[7mColon found in ``%f = %v''\033[0m" %% 23:2 the error message appears in inverse video on display screens that support ANSI terminal control sequences. Such practice is not normally recommended, because it may have undesirable effects on some output devices. Nevertheless, you may find it useful for restricted applications. For some types of bibliography fields, bibclean contains special-purpose code to supplement or replace the pattern matching:
Values for other fields are checked only against patterns. You can provide patterns for any field you like, even ones bibclean does not already know about. New ones are simply added to an internal table that is searched for each string to be validated. The special field, key, represents the bibliographic citation key. It can be given patterns, like any other field. Here is an initialization file pattern assignment that matches an author name, a colon, a four-digit year, a colon, and an alphabetic string, in the BibNet Project style: key = "A:dddd:A" %% Knuth:1986:TB Notice that no quotation marks are included in the pattern, because the citation keys are not quoted. You can use such patterns to help enforce uniform naming conventions for citation keys, which is increasingly important as your bibliography data base grows. ISBN INITIALIZATION FILESbibclean contains a compiled-in table of ISBN ranges and country/language settings that is suitable for most applications.However, ISBN data change yearly, as new countries adopt ISBNs, and as publishers are granted new, or additional, ISBN prefixes. Thus, from version 2.12, bibclean supports reading of run-time ISBN initialization files found on the PATH (for VAX VMS, SYS$SYSTEM) and BIBINPUTS search paths, and then any specified by -ISBN-file filename options. That feature makes it possible to incorporate new ISBN data without having to produce a new bibclean release and reinstall the software at end-user sites. The format of an ISBN initialization file is similar to that of the bibclean initialization files described in the preceding section: comments begin with percent and continue to end of line, blank and empty lines are ignored, backslash-newline joins adjacent lines, and otherwise, lines are expected to contain a required pair of ISBN country/language-publisher prefixes forming a non-decreasing range, optionally followed by one or more words of text that are treated as the country/language group value. The latter value plays no part in ISBN validation, but its presence is strongly recommended, in order to make the ISBN table more understandable for humans. Here is a short example: %% The Faeroes got ISBN assignments between 1993 and 1998 99918-0 99918-3 Faeroes 99918-40 99918-61 99918-900 99918-938 Data from ISBN files normally augment the compiled-in data. However, if the first prefix begins with a hyphen, then bibclean deletes the first entry in the table matching that first prefix (ignoring the leading hyphen): %% Latvia got ISBN ranges between 1993 and 1998 %% so we remove the old placeholder, then add the %% new ranges. -9984-0 9984-9 This one is no longer valid 9984-00 9984-20 Latvia 9984-500 9984-770 9984-9000 9984-9984 KEYWORD INITIALIZATION FILESbibclean contains a compiled-in table of keyword mappings that is suitable for most applications. The default settings merely adjust lettercase in certain keyword names, so that, for example, isbn is output as ISBN.From version 2.12, bibclean supports reading of run-time keyword initialization files found on the PATH (for VAX VMS, SYS$SYSTEM) and BIBINPUTS search paths, and then any specified by -keyword-file filename options. That feature makes it possible to incorporate special spellings of new keywords without having to produce a new bibclean release and reinstall the software at end-user sites. The format of a keyword initialization file is similar to that of the other bibclean initialization files described in the preceding sections: comments begin with percent and continue to end of line, blank and empty lines are ignored, backslash-newline joins adjacent lines, and otherwise, lines are expected to contain a required pair of old and new keyword names. Here is a short example: %% We want special handling of MathReviews keywords mrclass MRclass mrnumber MRnumber mrreviewer MRreviewer Data from keywords files normally augment the compiled-in data. However, if the first keyword begins with a hyphen, then bibclean deletes the first entry in the table matching that keyword (ignoring the leading hyphen): %% Remove special handling of ISBN, ISSN, and LCCN values. -issn ISSN -isbn ISBN -lccn LCCN Notice that this feature can be used to regularize keyword names, but use it with care, in order to avoid producing duplicate key names in output BibTeX entries: %% Map variations of keywords into a common name: keys keywords keywds keywords keyword keywords keywrd keywords keywrds keywords searchkey keywords LEXICAL ANALYSISWhen -no-prettyprint is specified, bibclean acts as a lexical analyzer instead of a prettyprinter, producing output in lines of the form<token-number><tab><token-name><tab>"<token-value>" Each output line contains a single complete token, identified by a small integer number for use by a computer program, a token type name for human readers, and a string value in quotes. Special characters in the token value string are represented with ANSI/ISO Standard C escape sequences, so all characters other than NUL are representable, and multi-line values can be represented in a single line. Here are the token numbers and token type names that can appear in the output when -prettyprint is specified: 0 UNKNOWN 1 ABBREV 2 AT 3 COMMA 4 COMMENT 5 ENTRY 6 EQUALS 7 FIELD 8 INCLUDE 9 INLINE 10 KEY 11 LBRACE 12 LITERAL 13 NEWLINE 14 PREAMBLE 15 RBRACE 16 SHARP 17 SPACE 18 STRING 19 VALUE Programs that parse such output should also be prepared for lines beginning with the warning prefix, %%, or the error prefix, ??, and for ANSI/ISO Standard C line-number directives of the form # line 273 "texbook1.bib"
that record the line number and file name of the current input file.
If a -max-width nnn command-line option was specified, long output lines are wrapped at a backslash-newline pair, and consequently, software that processes the lexical token stream should be prepared to collapse such wrapped lines back into single lines. As an example of the use of -no-prettyprint, the UNIX command pipeline bibclean -no-prettyprint mylib.bib | \ awk '$2 == "KEY" {print $3}' | \ sed -e 's/"//g' | \ sort A certain amount of processing has been done on the tokens. In particular, delimiters equivalent to braces have been replaced by braces, and braced strings have become quoted strings. The LITERAL token type is used for arbitrary text that bibclean does not examine further, such as the contents of a @Preamble{...} or a @Comment{...}. The UNKNOWN token type should never appear in the output stream. It is used internally to initialize token type variables. SCRIBE BIBLIOGRAPHY FORMATbibclean's support for the Scribe bibliography format is based on the syntax description in the Scribe Introductory User's Manual, 3rd Edition, May 1980. Scribe was originally developed by Brian Reid at Carnegie-Mellon University, and was marketed by Unilogic, Ltd., later renamed to Scribe Systems, and apparently now long defunct.The BibTeX bibliography format was strongly influenced by Scribe, and indeed, with care, it is possible to share bibliography files between the two systems. Nevertheless, there are some differences, so here is a summary of features of the Scribe bibliography file format:
Because of that loose syntax, bibclean's normal error detection heuristics are less effective, and consequently, Scribe mode input is not the default; it must be explicitly requested. ENVIRONMENT VARIABLES
FILES
SEE ALSObibcheck(1), bibdup(1), bibextract(1), bibindex(1), bibjoin(1), biblabel(1), biblex(1), biblook(1), biborder(1), bibparse(1), bibsearch(1), bibsort(1), bibtex(1), bibunlex(1), citefind(1), citesub(1), citetags(1), latex(1), scribe(1), tex(1).AUTHORNelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 Email: beebe@math.utah.edu, beebe@acm.org, beebe@computer.org (Internet) URL: http://www.math.utah.edu/~beebe COPYRIGHT######################################################################## ######################################################################## ######################################################################## ### ### ### bibclean: prettyprint and syntax check BibTeX and Scribe ### ### bibliography data base files ### ### ### ### Copyright (C) 1990--2016 Nelson H. F. Beebe ### ### ### ### This program is covered by the GNU General Public License (GPL), ### ### version 2 or later, available as the file COPYING in the program ### ### source distribution, and on the Internet at ### ### ### ### ftp://ftp.gnu.org/gnu/GPL ### ### ### ### http://www.gnu.org/copyleft/gpl.html ### ### ### ### This program is free software; you can redistribute it and/or ### ### modify it under the terms of the GNU General Public License as ### ### published by the Free Software Foundation; either version 2 of ### ### the License, or (at your option) any later version. ### ### ### ### This program is distributed in the hope that it will be useful, ### ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### ### GNU General Public License for more details. ### ### ### ### You should have received a copy of the GNU General Public ### ### License along with this program; if not, write to the Free ### ### Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, ### ### MA 02111-1307 USA ### ######################################################################## ######################################################################## ########################################################################
Visit the GSP FreeBSD Man Page Interface. |