bzz - DjVu general purpose compression utility.
bzz -e[blocksize] inputfile outputfile
bzz -d inputfile outputfile
The first form of the command line (option -e) compresses the data from
file inputfile and writes the compressed data into outputfile.
The second form of the command line (option -d) decompressed file
inputfile and writes the output to outputfile.
- -d
- Decoding mode.
- -e[blocksize]
- Encoding mode. The optional argument blocksize specifies the size
of the input file blocks processed by the Burrows-Wheeler transform
expressed in kilobytes. The default block sizes is 2048 KB.
The maximal block size is 4096 KB. Specifying a larger
block size usually produces higher compression ratios and increases the
memory requirements of both the encoder and decoder. It is useless to
specify a block size that is larger than the input file.
The Burrows-Wheeler transform is performed using a combination of the
Karp-Miller-Rosenberg and the Bentley-Sedgewick algorithms. This is comparable
to (Sadakane, DCC 98) with a slightly more flexible ranking scheme. Symbols
are then ordered according to a running estimate of their occurrence
frequencies. The symbol ranks are then coded using a simple fixed tree and the
ZP binary adaptive coder (Bottou, DCC 98).
The Burrows-Wheeler transform is also used in the well known
compressor bzip2. The originality of bzz is the use of the ZP
adaptive coder. The adaptation noise can cost up to 5 percent in file size,
but this penalty is usually offset by the benefits of adaptation.
The following table shows comparative results (in bits per character) on the
Canterbury Corpus ( http://corpus.canterbury.ac.nz ). The very good
bzz performance on the spreadsheet file excl puts the weighted
average ahead of much more sophisticated compressors such as fsmx.
Compression performance |
|
text |
fax |
csrc |
excl |
sprc |
tech |
poem |
html |
lisp |
man |
play |
Weighted |
Average |
compress |
3.27 |
0.97 |
3.56 |
2.41 |
4.21 |
3.06 |
3.38 |
3.68 |
3.90 |
4.43 |
3.51 |
2.55 |
3.31 |
gzip -9 |
2.85 |
0.82 |
2.24 |
1.63 |
2.67 |
2.71 |
3.23 |
2.59 |
2.65 |
3.31 |
3.12 |
2.08 |
2.53 |
bzip2 -9 |
2.27 |
0.78 |
2.18 |
1.01 |
2.70 |
2.02 |
2.42 |
2.48 |
2.79 |
3.33 |
2.53 |
1.54 |
2.23 |
ppmd |
2.31 |
0.99 |
2.11 |
1.08 |
2.68 |
2.19 |
2.48 |
2.38 |
2.43 |
3.00 |
2.53 |
1.65 |
2.20 |
fsmx |
2.10 |
0.79 |
1.89 |
1.48 |
2.52 |
1.84 |
2.21 |
2.24 |
2.29 |
2.91 |
2.35 |
1.63 |
2.06 |
bzz |
2.25 |
0.76 |
2.13 |
0.78 |
2.67 |
2.00 |
2.40 |
2.52 |
2.60 |
3.19 |
2.52 |
1.44 |
2.16 |
Note that DjVu contributors have several entries in this table.
Program compress was written some time ago by Joe Orost. Program
ppmd is an improvement of the PPM-C method invented by
Paul Howard.
Program bzz was written by Léon Bottou
<leonb@users.sourceforge.net> and was then improved by Andrei Erofeev
<andrew_erofeev@yahoo.com>, Bill Riemers <docbill@sourceforge.net>
and many others.
djvu(1), compress(1), gzip(1), bzip2(1)