extract - determine meta-information about a file
extract [ -bgihLmnvV ] [ -l library ] [ -p
type ] [ -x type ] file ...
This manual page documents version 1.0.0 of the extract command.
extract tests each file specified in the argument list in
an attempt to infer meta-information from it. Each file is subjected to the
meta-data extraction libraries from libextractor.
libextractor classifies meta-information (also referred to as
keywords) into types. A list of all types can be obtained with the -L
option.
- -b
- Display the output in BiBTeX format.
- -g
- Use grep-friendly output (all keywords on a single line for each file).
Use the verbose option to print the filename first, followed by the
keywords. Use the verbose option twice to also display the keyword types.
This option will not print keyword types or non-textual metadata.
- -h
- Print a brief summary of the options.
- -i
- Run plugins in-process (for debugging). By default, each plugin is run in
its own process.
- -l libraries
- Use the specified libraries to extract keywords. The general format of
libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*] where LIBRARYNAME is a
libextractor compatible library and typically of the form .Ijpeg. The
minus before the libraryname indicates that this library should be removed
from the existing list. To run only a few selected plugins, use -l in
combination with -n.
- -L
- Print a list of all known keyword types.
- -m
- Load the file into memory and perform extraction from memory (for
debugging).
- -n
- Do not use the default set of extractors (typically all standard
extractors, currently mp3, ogg, jpg, gif, png, tiff, real, html, pdf and
mime-types), use only the extractors specified with the .B -l option.
- -p type
- Print only the keywords matching the specified type. By default, all
keywords that are found and not removed as duplicates are printed.
- -v
- Print the version number and exit.
- -V
- Be verbose. This option can be specified multiple times to increase
verbosity further.
- -x type
- Exclude keywords of the specified type from the output. By default, all
keywords that are found and not removed as duplicates are printed.
libextractor(3) - description of the libextractor library
$ extract test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
mimetype - image/jpeg
$ extract -V -x comment test/test.jpg
Keywords for file test/test.jpg:
mimetype - image/jpeg
$ extract -p comment test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
$ extract -nV -l png.so -p comment test/test.jpg test/test.png
Keywords for file test/test.jpg:
Keywords for file test/test.png:
comment - Testing keyword extraction
libextractor and the extract tool are released under the GPL. libextractor is a
GNU package.
A couple of file-formats (on the order of 10^3) are not recognized...
extract was originally written by Christian Grothoff
<christian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>.
Use <libextractor@gnu.org> to contact the current maintainer(s).
You can obtain the original author's latest version from
http://www.gnu.org/software/libextractor/