|
|
| |
dictfmt - formats a DICT protocol dictionary database
dictfmt -c5|-t|-e|-f|-h|-j|-p [options] basename
dictfmt -i|-I [options]
dictfmt takes a file, FILE, on stdin, and creates a dictionary
database named basename.dict, that conforms to the DICT protocol. It
also creates an index file named basename.index. By default, the index
is sorted according to the C locale, and only alphanumeric characters and
spaces are used in sorting, however this may be changed with the --locale and
--allchars options. ( basename is commonly chosen to correspond to the
basename of FILE , but this is not mandatory.)
Unless the database is extremely small, it is highly recommended
that basename.dict be compressed with /usr/bin/dictzip to
create basename.dict.dz. (dictzip is included in the dictd
source package.)
FILE may be in any of the several formats described by the format
options -c5, -t, -e, -f, -h, -j, -p, -i or -I. Exactly one of these options
must be given.
dictfmt prepends several headers are to the .dict file. The
00-database-url header gives the value of the -u option as the URL of the
site from which the original database was obtained. The 00-database-short
header gives the value of the -s option as the short name of the dictionary.
(This "short name" is the identifying name given by the
"dict- D" option.) If the -u and/or -s options are omitted, these
values will be shown as "unknown", which is undesirable for a
publicly distributed database.
The date of conversion (formatting) is given in the
00-database-info header. All text in the input file prior to the first
headword (as defined by the appropriate formatting option) is appended to
this header. All text in the input file following a headword, up to the next
headword, is copied unchanged to the .dict file.
- -c5
- FILE is formatted with headwords preceded by 5 or more
underscore characters (_) and a blank line. All text until the next
headword is considered the definition. Any leading `@' characters
are stripped out, but the file is otherwise unchanged. This option was
written to format the CIA WORLD FACTBOOK 1995.
- -t
- -c5, --without-info and --without-headword options are implied. Use this
option, if an input database comes from dictunformat utility.
- -e
- FILE is in html format, with the headword tagged as bold.
(<B>headword - </B>)
This option was written to format EASTON'S 1897 BIBLE
DICTIONARY. A typical entry from Easton is:
<A NAME="T0000005">
<B>Abagtha - </B>
one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).
This is converted to:
Abagtha
one of the seven eunuchs in Ahasuerus's court (Esther 1:10; 2:21).
The heading "<A NAME="T0000005"> is omitted,
and the headword `Abagtha' is indexed.
NOTE: This option should be used with caution. It removes
several html tags (enough to format Easton properly), but not all. The
Makefile that was originally written to format dict-easton uses sed scripts
to modify certain cross reference tags. It may be necessary to pipe the
input file through a sed script, or hack the source of dictfmt in order to
properly format other html databases.
- -f
- FILE is formatted with the headwords starting in column 0,
with the definition indented at least one space (or tab character) on
subsequent lines. The third line starting in column 0 is taken as the
first headword , and the first two lines starting in column 0 are
treated as part of the 00-database-info header. This option was written to
format the F.O.L.D.O.C.
- -h
- FILE is formatted with the headwords starting in column 0,
followed by a comma, with the definition continuing on the same line. All
text before the first single character line is included in
00-database-info header, and lines with only one character are omitted
from the .dict file. The first headword is on the line following the
first single character line. The headword is indexed; the text
of the file is not changed. This option was written to format HITCHCOCK'S
BIBLE NAMES DICTIONARY.
- -j
- FILE is formatted with headwords starting in col 0, enclosed
in colons, followed by the definition. The colons surrounding the
headword are removed, and the headword is indexed. Lines
beginning with '*', '=', or '-' are also removed. All text before the
first headword is included in the headers. This option was written to
format the JARGON FILE.
NOTE: Some recent versions of the JARGON FILE had
three blanks inserted before the first colon at each headword. These must be
removed before processing with dictfmt. (sed scripts have been used for this
purpose. ed, awk, or perl scripts are also possible.)
- -p
- FILE is formatted with `%h' in column 0, followed by a blank,
followed by the headword, optionally followed by a line containing
`%d' in column 0. The definition starts on the following line. The first
line beginning ´%h´ and any lines beginning '%d' are
stripped from the .dict file, and '%h ' is stripped from in front of the
headword. All text before the first headword is included in the headers.
The second line beginning '%h' is taken as the first headword. This
option was written to format Jay Kominek's elements database.
- -i -I
- These two options are different from all other formatting options. They
are intended to resort (according to dictd requirement) an .index
file given on stdin. That is .dict file is not generated at all. Only
resorting is made. Three- or four-column .index like input is expected.
-i expects decimal offset and length, while -I expects them
in base64 format.
- -u url
- Specifies the URL of the site from which the raw database was obtained. If
this option is specified, 00-database-url headword and appropriate
definition will be ignored.
- -s name
- Specifies the name and, optionally, the version and date, of the database.
(If this contains spaces, it must be quoted.) If this option is specified,
00-database-short headword and appropriate definition will be
ignored.
- -L
- display license and copyright information
- -V
- display version information
- -D
- output debugging information
- --help
- display a help message
- --locale locale
- Specifies the locale used for sorting. If no locale is specified, the
"C" locale is used. For using UTF-8 mode, --utf8 is needed.
- --8bit
- generates database in 8-bit mode, see --locale option also.
Note: This option is deprecated. Use it for
creating 8-bit (non-UTF8) dictionaries only. In order to create UTF-8
dictionary, use --utf8 option instead.
- --utf8
- If specified, UTF-8 database is created.
- --allchars
- Specifies that all characters should be used for the search, by default
only alphabetic, numeric characters and spaces are put to .index file and
therefore are used in search. Creates the special entry
00-database-allchars.
- --case-sensitive
- makes the search case sensitive. Creates the special entry
00-database-case-sensitive.
- --headword-separator sep
- sets the headword separator, which allows several words to have the same
definition. For example, if ´--headword-separator %%%' is given,
and the input file contains ´autumn%%%fall', both 'autumn' and
'fall' will be indexed as headwords, with the same definition.
- --index-data-separator sep
- sets the index/data separator, which allows to set the first and fourth
columns of .index file independently. That is the first column can be
treated as an index column (where the MATCH command searches) and the
fourth column as a result column (where the MATCH gets things to be
returned), and they (1-st and 4-th columns) are completely independant of
each other. The default value for this separator is ASCII symbol "
\034".
- --break-headwords
- multiple headwords will be written on separate lines in the .dict file.
For use with '--headword-separator.
- --index-keep-orig
- When --utf-8 is specified headwords are lowercased and non-alphanumeric
characters are removed from it before saving to .index file in order to
simplify the search. When --index-keep-orig option is used fourth column
is created (if necessary) in .index file, and contains an original
headword which is returned by MATCH command. This option may be useful to
prevent converting " AT&T" to " ATT" or to keep
proper nouns with uppercased first letter.
- --without-headword
- headwords will not be included in .dict file
- --without-header
- header will not be copied to DB info entry
- --without-url
- URL will not be copied to DB info entry
- --without-time
- time of creation will not be copied to DB info entry
- --without-ver
- By default dictfmt creates a special entry
00-database-dictfmt-X.Y.Z that contains (in .dict file) dictfmt version in
format dictfmt-X.Y.Z. This option suppresses this.
- --without-info
- DB info entry will not be created. This may be useful if 00-database-info
headword is expected from stdin (dictunformat outputs it).
- --columns columns
- By default dictfmt wraps strings read from stdin to 72 columns.
This option changes this default. If it is set to zero or negative value,
wrapping is off.
- --default-strategy strategy
- Sets the default search strategy for the database. It will be used instead
of strategy '.'. Special entry 00-database-default-strategy is
created for this purpose. This option may be useful, for example, for
dictionaries containing mainly phrases but the single words. In any case,
use this option if you are absolutely sure what you are doing.
- --mime-header mime_header
- When client sends OPTION MIME command to the dictd ,
definitions found in this database are prepended by the specified MIME
header. Creates the special entry 00-database-mime-header.
dictfmt was written by Rik Faith (faith@cs.unc.edu) as part of the
dict-misc package. dictfmt is distributed under the terms of the GNU
General Public License. If you need to distribute under other terms, write to
the author.
This manual page was written by Robert D. Hilliard <hilliard@debian.org> .
dict(1), dictd(8), dictzip(1), dictunformat(1),
http://www.dict.org, RFC 2229
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |