|
NAMEdictzip, dictunzip - compress (or expand) files, allowing random accessSYNOPSISdictzip [options] name DESCRIPTIONdictzip compresses files using the gzip(1) algorithm (LZ77) in a manner which is completely compatible with the gzip file format. An extension to the gzip file format (Extra Field, described in 2.3.1.1 of RFC 1952) allows extra data to be stored in the header of a compressed file. Programs like gzip and zcat will ignore this extra data. However, dictd(8), the DICT protocol dictionary server will make use of this data to perform pseudo-random access on the file. Files in the dictzip format should end in ".dz" so that they may be distinguished from common gzip files that do not contain the special header information.From RFC 1952, the extra field is specified as follows: If the FLG.FEXTRA bit is set, an "extra field"
is present in the header, with total length XLEN bytes. It consists of a
series of subfields, each of the form:
+---+---+---+---+==================================+ |SI1|SI2| LEN |... LEN bytes of subfield data ...| +---+---+---+---+==================================+ SI1 and SI2 provide a subfield ID, typically two ASCII letters with some mnemonic value. Jean-Loup Gailly <gzip@prep.ai.mit.edu> is maintaining a registry of subfield IDs; please send him any subfield ID you wish to use. Subfield IDs with SI2 = 0 are reserved for future use. LEN gives the length of the subfield data, excluding the 4 initial bytes. The dictzip program uses 'R' for SI1, and 'A' for SI2 (i.e., "Random Access"). After the LEN field, the data is arranged as follows: +---+---+---+---+---+---+===============================+ | VER | CHLEN | CHCNT | ... CHCNT words of data ... | +---+---+---+---+---+---+===============================+ As per RFC 1952, all data is stored least-significant byte first. For VER 1 of the data, all values are 16-bits long (2 bytes), and are unsigned integers. XLEN (which is specified earlier in the header) is a two byte integer, so the extra field can be 0xffff bytes long, 2 bytes of which are used for the subfield ID (SI1 and SI1), and 2 bytes of which are used for the subfield length (LEN). This leaves 0xfffb bytes (0x7ffd 2-byte entries or 0x3ffe 4-byte entries). Given that the zip output buffer must be 10% + 12 bytes larger than the input buffer, we can store 58969 bytes per entry, or about 1.8GB if the 2-byte entries are used. If this becomes a limiting factor, another format version can be selected and defined for 4-byte entries. For compression, the file is divided up into "chunks" of data, each chunk is less than 64kB, and can be compressed into an area that is also less than 64kB long (taking incompressible data into account -- usually the data is compressed into a block that is much smaller than the original). The CHLEN field specifies the length of a "chunk" of data. The CHCNT field specifies how many chunks are preset, and the CHCNT words of data specifies how long each chunk is after compression (i.e., in the current compressed file). To perform random access on the data, the offset and length of the data are provided to library routines. These routines determine the chunk in which the desired data begins, and decompresses that chunk. Consecutive chunks are decompressed as necessary. TRADEOFFS
OPTIONS
CREDITSdictzip was written by Rik Faith (faith@cs.unc.edu) and is distributed under the terms of the GNU General Public License. If you need to distribute under other terms, write to the author.The main libraries used by this programs (zlib, regex, libmaa) are distributed under different terms, so you may be able to use the libraries for applications which are incompatible with the GPL -- please see the copyright notices and license information that come with the libraries for more information, and consult with your attorney to resolve these issues. SEE ALSOdict(1), dictd(8), gzip(1), gunzip(1), zcat(1)
Visit the GSP FreeBSD Man Page Interface. |