|
|
| |
ezmlm-archive(1) |
FreeBSD General Commands Manual |
ezmlm-archive(1) |
ezmlm-archive - create thread and author index for a mailing list archive
ezmlm-archive [ -cCFTvV ][ -f msg1 ]
][ -t msg2 ] dir
ezmlm-archive reads the index files from a message archive, and creates a
subject index, a collection of subject files, and a collection of author
files. These files are suitable as an index for WWW access to, and navigation
through a mailing list archive by ezmlm-cgi(1).
The index files read are created by ezmlm-idx(1) on a
per-list basis and by ezmlm-send(1) on a per-message archive for a
indexed list.
The output files created are:
- dir/archive/threads/yyyymm
- The thread index. It contains one line per subject, starting with the
number of the first message with that subject within the set investigated,
``:'', a 20 character subject hash, blank, ``[n]'' where ``n'' is the
number of messages in the thread, blank, and the subject. The file
``yyyymm'' contains entries for all threads that have messages in the
month ``yyyymm'' or that have messages both before and after that month.
The subject hash is a key to the subject files; the message number is a
key to the index file. The lines are in ascending order by message number
when the index is created de novo on an existing archive. When the
messages are added one-by-one as in normal archive operation, ``n'' is the
number of message in the thread for the particular month and the
order is in reverse of latest message, i.e. the last extended thread is
shown last. The message number accompanying a thread is always a message
within the thread. It is the first in archives created on existing lists,
and the last message in incrementally created archives. Use the
corresponding subject index file to get a list of all messages in the
thread in ascending order.
- dir/archive/subjects/xx/yyyyyyyyyyyyyyyyyy
- A subject file. The first line is the subject hash, a space, and the
subject. This is followed by one line per message with this subject, in
the format message number, ``:'', date (yyyymm), ``:'', author hash,
blank, author from line. The lines are sorted by message number. The
author hash is a key to the author files; the message number is a key to
the index file. The file in the example would be for the subject hash
``xxyyyyyyyyyyyyyyyyyy''.
- dir/archive/authors/xx/yyyyyyyyyyyyyyyyyy
- An author file. The first line is the author hash, a space, and the author
from line. This is followed by one line per message with this author, in
the format message number, ``:'', date (yyyymm), ``:'', subject hash,
blank, subject. The lines are sorted by message number. The subject hash
is a key to the subject files; the message number is a key to the index
file. The file in the example would be for the author hash
``xxyyyyyyyyyyyyyyyyyy''.
dir/archnum keeps track of the last message
processed. Normally, ezmlm-archive will process entries for
messages from one above the contents of this file up to an including the
message number in dir/num.
ezmlm-archive writes messages in a crash-proof manner when run in normal
mode. When overriding the normal message range with any of the options listed,
the normal sync(3) of the output files is suppressed for efficiency.
Should the computer crash during this time the state of the indices is not
defined. Use the -s option in the (extremely rare) cases where this
would be a problem.
- -c
- Create a new index. This overrides dir/archnum
causing ezmlm-archive to start with the first message in the
archive. Synonym for -f0. NOTE:
ezmlm-archive does not remove files in the index. While it will
overwrite/update old files it will not remove files that are obsolete for
other reasons.
- -C
- (Default.) Process entries starting with the message after the message
listed in dir/archnum.
- -f msg1
- Process messages from the archive section (set of 100 messages) containing
message msg1. This is useful if you have removed part of the
archive, as it will shorten processing time and decrease memory use.
NOTE: ezmlm-archive does not remove files in the index.
While it will overwrite/update old files it will not remove files that are
obsolete for other reasons. The number of messages per thread will be
incorrect when using of the -f and -t switches leads to
partial re-indexing of already indexed messages.
- -F
- (Default.) Do not change the starting message from the default (see
-C).
- -s
- Always sync files.
- -S
- (Default.) Sync files, except when on of the message range modifying
options is used.
- -t msg2
- Process messages to message msg2 instead of the last message in the
archive. Again, files written are corrected, but other files are not
explicitly removed.
- -T
- (Default.) Process entries for messages up to the last message in the
archive.
- -v
- Display ezmlm-archive version info.
- -V
- Display ezmlm-archive version info.
ezmlm-archive stores its linked lists in memory. On at 32-bit
architecture, it uses 12 bytes per message, 28 bytes per thread (plus one copy
of the subject), and 20 bytes per author (plus one copy of the author from
line).
In normal list use, it processes only at most a few messages at a
time, but for initial processing of a large archive, considerable amounts of
memory may be used. Assuming 40 bytes for subject/from line, 5 messages per
thread, 100,000 messages, and 1000 authors, this is 2.5 MB. For 1,000,000
messages this is about 20 MB.
Thus, for large archives, it may be useful to use the -t
switch to process the archive in multiple subsets, starting with e.g. the
first 100,000, then the next, and so on.
ezmlm-cgi(1), ezmlm-idx(1), ezmlm-send(1), ezmlm(5)
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |