|
NAMEpdfgrep - search PDF files for a regular expressionSYNOPSISpdfgrep [OPTION...] PATTERN [FILE...] pdfgrep [OPTION...] [-e PATTERN | -f FILE] [FILE...] DESCRIPTIONSearch for PATTERN in each PDF FILE and print matching lines. By default, PATTERN is an extended regular expression.pdfgrep tries to be mostly compatible with GNU grep with some PDF-specific distinctions and additional options. Most notably, -n prints page instead of line numbers. OPTIONSGeneral Information--helpPrint a short summary of the options.
-V, --version Show version information.
Pattern Interpretation-F, --fixed-stringsInterpret PATTERN as a list of fixed strings
separated by newlines, any of which is to be matched.
-P, --perl-regexp Interpret PATTERN as a Perl compatible regular
expression (PCRE). See pcresyntax(3) for a quick overview.
Matching Control-e PATTERN, --regexp=PATTERNUse PATTERN as the pattern to search for. If this
option is specified multiple times or combined with --file, all
patterns are tried in turn until one of them matches.
-f FILE, --file=FILE Read patterns from FILE, one per line. If
FILE contains multiple patterns or if this option is applied multiple
times or combined with -e, all patterns are tried in turn until one of
them matches. An empty pattern list matches nothing.
-i, --ignore-case Ignore case distinctions in both the PATTERN and
the input files.
General Output Control-c, --countSuppress normal output. Instead print the number of
matches for each input file. Note that unlike grep, multiple matches on the
same page will be counted individually.
-p, --page-count Like -c, but prints the number of matches per
page. Implies -n.
--color WHEN Surround file names, page numbers and matched text with
escape sequences to display them in color on the terminal. WHEN can be:
-L, --files-without-match Suppress normal output. Instead print the name of each
input file that doesn’t contain a match. This works well with
-Z, but many other output options like -n or -c are
ignored when -L is specified.
-l, --files-with-matches Suppress normal output. Instead print the name of each
input file that contains a match. This works well with -Z, but many
other output options like -n or -c are ignored when -l is
specified.
-m, --max-count NUM Stop reading a file after NUM matches. When the -c
or --count option is also used, pdfgrep does not output a count greater than
NUM.
-o, --only-matching Print only the matched part of a line without any
surrounding context.
-q, --quiet Suppress all normal output to stdout. Exit immediately
with exit status 0 if a match is found, even in case of errors. Use this if
you only care about the presence of matches, not their number or
content.
Line Prefix Control-H, --with-filenamePrint the file name for each match. This is the default
setting when there is more than one file to search.
-h, --no-filename Suppress the prefixing of file name on output. This is
the default setting when there is only one file to search.
-n, --page-number Prefix each match with the number of the page where it
was found.
-Z, --null Output a null byte (called NUL in ASCII and '\0'
in C) instead of the colon that usually separates a filename from the rest of
the line. This option makes the output unambiguous in the presence of colons,
spaces or newlines in the filename. It can be used in conjunction with
commands such as xargs -0 or perl -0.
--match-prefix-separator SEP Changes the colon used to separate filename, line number
and text in the output to SEP, which can be an arbitrary string. This
is useful when filenames contain colons, but only for interactive usage. For
scripting, --null should be used.
Context Control-A NUM, --after-context=NUMPrint NUM lines of context after matching lines.
Contiguous groups of matches are separated by a line containing --.
With -o, this option has no effect.
-B NUM, --before-context=NUM Print NUM lines of context before matching lines.
Contiguous groups of matches are separated by a line containing --.
With -o, this option has no effect.
-C NUM, --context=NUM Print NUM lines of context before and after
matching lines. Contiguous groups of matches are separated by a line
containing --. With -o, this option has no effect.
File Selection-r, --recursiveRecursively search all files (restricted by
--include and --exclude) under each directory, following
symlinks only if they are on the command line.
-R, --dereference-recursive Same as -r, but follows all symlinks.
--exclude=GLOB Skip files whose base name matches GLOB. See
glob(7) for wildcards you can use. You can use this option multiple
times to exclude more patterns. It takes precedence over --include.
Note, that in- and excludes apply only to files found via --recursive
and not to the argument list.
--include=GLOB Only search files whose base name matches GLOB.
See --exclude for details. The default is *.pdf.
Other Options--cacheUse a cache for the rendered text to speed up the
operation on large files.
--password=PASSWORD Use PASSWORD to decrypt the PDF-files. Can be specified
multiple times; all passwords will be tried on all PDFs. Note that this
password will show up in your command history and the output of ps(1).
So please do not use this if the security of PASSWORD is
important.
--page-range=RANGE Limit search to a specified set of pages. RANGE is
a comma separated list of either a single page number or a range expression of
the form PAGE1-PAGE2. Example: 2-3,5,7-10.
--debug Enable debug output. Note: Due to limitations of
poppler before version 0.30.0, some debug output is also printed without
--debug when using such a poppler version.
--warn-empty Print a warning to stderr if a PDF contains no
searchable text. This is the case for PDFs that consist only of images, for
example scanned documents.
--unac Remove accents and ligatures from both the search pattern
and the PDF documents. This is useful if you want to search for a word
containing "ae", but the PDF uses the single character
"æ" instead. See unac(3) and unaccent(1) for
details.
This option is experimental and only available if pdfgrep is compiled with unac support. EXIT STATUSNormally, the exit status is 0 if at least one match is found, 1 if no match is found and 2 if an error occurred. But if the --quiet or -q option is used and a match was found, pdfgrep will return 0 regardless of errors.ENVIRONMENT VARIABLESThe behavior of pdfgrep is affected by the following environment variable.GREP_COLORS Specifies the colors and other attributes used to
highlight various parts of the output. The syntax and values are like
GREP_COLORS of grep. See grep(1) for more details.
Currently only the capabilities mt, ms, mc, fn,
ln and se are used by pdfgrep, where mt, ms
and mc have the same effect.
FILES${XDG_CACHE_HOME}/pdfgrep/*Cache files written and used when --cache is
enabled. At most 200 cache entries older than a day are retained.
EXAMPLESPrint the first ten lines matching pattern and print their page number:pdfgrep -n --max-count 10 pattern foo.pdf Search all .pdf files whose names begin with foo recursively in the current directory: pdfgrep -r --include "foo*.pdf" pattern Search all PDFs in the current directory for foo that also contain bar: pdfgrep -Z --files-with-matches "bar" *.pdf | xargs -0 pdfgrep -H foo Search all .pdf files that are smaller than 12M recursively in the current directory: find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern Note that in contrast to the previous examples, this task could not be solved with pdfgrep alone, but the Unix tools find(1) and xargs(1) had to be used. That’s because pdfgrep itself doesn’t include options to exclude files by their size. But as you see, it doesn’t have to! BUGSReporting BugsBugs can either be reportet to the mailing list (pdfgrep-users@pdfgrep.org) or to the bugtracker on gitlab (https://gitlab.com/pdfgrep/pdfgrep/issues).AUTHORSpdfgrep is maintained by Hans-Peter Deifel.See the AUTHORS file in the source for a full list of contributors. SEE ALSOgrep(1), pcre(3), regex(7)See pdfgrep’s website https://pdfgrep.org for more information, downloads, git repository and more.
Visit the GSP FreeBSD Man Page Interface. |