|
|
| |
abyss-pe(1) |
User Commands |
abyss-pe(1) |
abyss-pe - assemble reads into contigs
abyss-pe [OPTION]... [PARAMETER=VALUE]...
[MAKE_TARGET]...
Assemble the reads of the input files into contigs. The reads may be in FASTA,
FASTQ, qseq, export, SRA, SAM or BAM format and may be compressed with gz, bz2
or xz and may be tarred.
abyss-pe is a Makefile script. Any options of make may also be
used with abyss-pe.
- name, JOB_NAME
- The name of this assembly. The resulting scaffolds will be stored in
${name}-scaffolds.fa.
- in
- input files. Use this variable when assembling data from a single
library.
- lib
- a quoted list of whitespace-separated paired-end library names. Use this
variable when assembling data from multiple paired-end libraries. For each
library name in lib, the user must define a variable on the command line
with the same name, which indicates the read files for that library. See
EXAMPLES below for a concrete example of usage.
- pe
- list of paired-end libraries that will be used only for merging unitigs
into contigs and will not contribute toward the consensus sequence.
- mp
- list of mate-pair libraries that will be used for scaffolding. Mate-pair
libraries do not contribute toward the consensus sequence.
- long
- list of long sequence libraries that will be used for rescaffolding. long
sequence libraries do not contribute toward the consensus sequence.
- se
- files containing single-end reads
- a
- maximum number of branches of a bubble [2]
- b
- maximum length of a bubble (bp) [""]
abyss-pe has two bubble popping stages. The default limits are 3*k bp for
ABYSS and 10000 bp for PopBubbles.
- c
- minimum mean k-mer coverage of a unitig [sqrt(median)]
- d
- allowable error of a distance estimate (bp) [6]
- e
- minimum erosion k-mer coverage [round(sqrt(median))]
- E
- minimum erosion k-mer coverage per strand [1 if sqrt(median) > 2 else
0]
- j
- number of threads [2]
- k
- size of a k-mer (when K is not set) or the span of a k-mer pair (when K is
set)
- K
- size of a single k-mer in a k-mer pair (bp)
- l
- minimum alignment length of a read (bp) [40]
- m
- minimum overlap of two unitigs (bp) [k-1]
- n
- minimum number of pairs required for building contigs [10]
- N
- minimum number of pairs required for building scaffolds [n]
- p
- minimum sequence identity of a bubble [0.9]
- q
- minimum base quality when trimming [3]
Trim bases from the ends of reads whose quality is less q.
- Q
- minimum base quality [0]
Mask all bases of reads whose quality is less than Q as `N'.
- s
- minimum unitig size required for building contigs (bp) [1000]
The seed length should be at least twice the value of k. If more sequence is
assembled than the expected genome size, try increasing s.
- S
- minimum contig size required for building scaffolds (bp) [1000-10000]
- SS
- SS=--SS to assemble in strand-specific mode
Requires that all libraries are strand-specific RNA-Seq libraries. Assumes
that the first read in a read pair is reversed WRT the transcripts
sequenced.
- t
- maximum length of blunt contigs to trim [k]
- v
- v=-v to enable verbose logging
- np, NSLOTS
- the number of processes of an MPI assembly
- mpirun
- the path to mpirun
- aligner
- The program to use to align the reads to the contigs [map].
Permitted values are: map, kaligner, bwa, bwasw, bowtie, bowtie2, dida. See
the DIDA section below for further info on the dida option.
- cs
- convert colour-space contigs to nucleotide contigs following assembly
- -n, --dry-run
- Print the commands that would be executed, but do not execute them.
- default
- Equivalent to `scaffolds scaffolds-dot stats'.
- unitigs
- Assemble unitigs.
- unitigs-dot
- Output the unitig overlap graph.
- pe-sam
- Map paired-end reads to the unitigs and output a SAM file. SAM file will
only contain reads mapping to different contigs, and the read ID, sequence
and quality strings will be replaced with '*' characters.
- pe-bam
- Map paired-end reads to the unitigs and output a BAM file. BAM file will
only contain reads mapping to different contigs, and the read ID, sequence
and quality strings will be replaced with '*' characters.
- pe-index
- Generate an index of the unitigs used by abyss-map.
- contigs
- Assemble contigs.
- contigs-dot
- Output the contig overlap graph.
- mp-sam
- Map mate-pair reads to the contigs and output a SAM file. SAM file will
only contain reads mapping to different contigs, and the read ID, sequence
and quality strings will be replaced with '*' characters.
- mp-bam
- Map mate-pair reads to the contigs and output a BAM file. BAM file will
only contain reads mapping to different contigs, and the read ID, sequence
and quality strings will be replaced with '*' characters.
- mp-index
- Generate an index of the contigs used by abyss-map.
- scaffolds
- Assemble scaffolds.
- scaffolds-dot
- Output the scaffold overlap graph.
- scaftigs
- Break scaffolds and generate AGP file.
- long-scaffs
- Rescaffold using RNA-Seq assembled contigs.
- long-scaffs-dot
- Output the RNA scaffold overlap graph.
- stats
- Display assembly contiguity statistics.
- clean
- Remove intermediate files.
- version
- Display the version of abyss-pe.
- versions
- Display the versions of all programs used by abyss-pe.
- help
- Display a helpful message.
ABySS supports the use of DIDA (Distributed Indexing Dispatched Alignment), an
MPI-based alignment framework for computing sequence alignments across
multiple machines. To use DIDA with ABySS, first download and install DIDA
from http://www.bcgsc.ca/platform/bioinfo/software/dida, then specify `dida`
as the value of the aligner parameter to abyss-pe.
- DIDA_MPIRUN
- The `mpirun` command used to run DIDA jobs.
- DIDA_RUN_OPTIONS
- Runtime options such as number of threads per MPI rank and values for
environment variables (e.g. PATH, LD_LIBRARY_PATH). Run `abyss-dida
--help` for a list of available options.
- DIDA_OPTIONS
- Options that are passed directly to the DIDA binary. For example, this can
be used to control the minimum alignment length threshold. Run
`dida-wrapper --help` for a list of available options.
Due to its use of multi-threading, DIDA has known deadlocking issues with
OpenMPI. Using the MPICH MPI library is strongly recommended when running
assemblies with DIDA. Testing was done with MPICH 3.1.3, compiled with
--enable-threads=funneled.
The recommended runtime configuration for DIDA is 1 MPI rank per machine and 1
thread per CPU core. For example, to run an assembly across 3 cluster nodes
with 12 cores each, do:
abyss-pe k=64 name=ecoli in='reads1.fa reads2.fa' aligner=dida
DIDA_RUN_OPTIONS='-j12' DIDA_MPIRUN='mpirun -np 3 -ppn 1 -bind-to board'
This example uses the MPICH command line options for `mpirun`.
Here, `-np 3` indicates the number of MPI ranks, `-ppn 1` indicates the
number of MPI ranks per "node", and `-bind-to board` defines a
"node" to be a motherboard (i.e. a full machine).
Any parameter that may be specified on the command line may also be specified in
an environment variable.
- PATH
- must contain the directory where the ABySS executables are installed. Use
`abyss-pe versions` to check that PATH is configured correctly.
- TMPDIR
- specifies a directory to use for temporary files
ABySS integrates well with cluster job schedulers, such as:
* SGE (Sun Grid Engine)
* Portable Batch System (PBS)
* Load Sharing Facility (LSF)
* IBM LoadLeveler
The SGE environment variables JOB_NAME, SGE_TASK_ID and NSLOTS may
be used to specify the parameters name, k and np, respectively, and
similarly for other schedulers.
abyss-pe k=64 name=ecoli in='reads1.fa reads2.fa'
abyss-pe k=64 name=ecoli lib='lib1 lib2' \
lib1='lib1_1.fa lib1_2.fa' lib2='lib2_1.fa lib2_2.fa' \
se='se1.fa se2.fa'
abyss-pe k=64 name=ecoli lib='pe1 pe2' mp='mp1 mp2' \
pe1='pe1_1.fa pe1_2.fa' pe2='pe2_1.fa pe2_2.fa' \
mp1='mp1_1.fa mp1_2.fa' mp2='mp2_1.fa mp2_2.fa' \
se='se1.fa se2.fa'
abyss-pe k=64 name=ecoli lib=pe1 mp=mp1 long=long1 \
pe1='pe1_1.fa pe1_2.fa' mp1='mp1_1.fa mp1_2.fa' \
long1=long1.fa
abyss-pe np=8 k=64 name=ecoli in='reads1.fa reads2.fa'
qsub -N ecoli -t 64 -pe openmpi 8 \
abyss-pe n=10 in='reads1.fa reads2.fa'
Written by Shaun Jackman.
Report bugs to <abyss-users@googlegroups.com>.
Copyright 2015 Canada's Michael Smith Genome Sciences Centre
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |