|
NAMEfastq-trim - Trim adapters and low-quality bases from FASTQ filesSYNOPSISfastq-trim [--help] fastq-trim [--verbose] [--exact-match] [--3p-adapter1 seq] [--3p-adapter2 seq] [--min-match N] [--max-mismatch-percent N] [--min-qual N] [--min-length N] [--polya-min-length N] [--phred-base N] [infile1.fastq[.gz|.bz2|.xz]] [outfile1.fastq[.gz|.bz2|.xz]] [infile2.fastq[.gz|.bz2|.xz] outfile2.fastq[.gz|.bz2|.xz]] OPTIONS AND ARGUMENTS
DESCRIPTIONFastq-trim removes adapters and low-quality bases from the ends of each read in a FASTQ file. Reads with a length less than the specified or default minimum are not output.Note that adapter matching (A.K.A. alignment) is not an exact science. Most bioinformatics data contain errors and hence the processing must be probabilistic. It is possible that an adapter sequence occurs naturally in a given sample. The longer the adapter, the less often this will occur, which is why adapters are typically 12 or more bases long. Also, read errors can occur in adapters as well as in the insert (the real DNA/RNA sequence between the adapters). This is rare and using an exact match algorithm like memcmp(3) will generally find more than 99% of adapters. Tolerating some slop in adapter matching will result in fewer adapters left in the data and a higher risk of false positives (removing natural sequences resembling adapters). Neither situation is catastrophic. If a fraction of a percent of reads will not align to a genome properly because of adapter contamination, the end results of the downstream analysis won't tell a different story. The same is true of reads that were shortened by removing a falsely identified adapter. The default tolerance is 10% of the adapter length (or the remaining bases at the end of the read if that's shorter). This can be increased using --max-mismatch_percent. The default "smart" adapter matching allows for roughly 10% of the bases to be substituted. No insertions or deletions are currently handled for adapter matching. Exact matching can be selected using --exact-match. In either case, a minimum of min_match (default 3, controlled by --min-match) bases must be matched in order to decide that the sequence is indeed an adapter. Low-quality 3' end removal uses the same algorithm as BWA and Cutadapt, namely scanning backward from the 3' end until the sum of (score - min-qual) becomes > 0, then trimming at the location of the minimum sum. Quality trimming is done before adapter matching so that likely misread bases do not contribute to the adapter match scoring. Input and output files may be compressed using gzip(1), bzip2(1), or xz(1). Support for this is provided by xt_fopen(3), which automatically determines the file type from the filename extension and pipes input or output as needed. Compression level of output files and other options can be be passed to gzip(1), bzip2(1), and xz(1) via the environment variables GZIP, BZIP2, BZIP, and XZ_OPT. This is often useful for adjusting the compression level of output files in order to tune performance. ENVIRONMENTGZIP, BZIP2, BZIP, XZ_OPT: Fine-tune output compression.SEE ALSOfastq-scum(1), biolibc(3)BUGSPlease report bugs to the author and send patches in unified diff format. (man diff for more information)AUTHORJ. Bacon Visit the GSP FreeBSD Man Page Interface. |