|
Nameabyss-sealer - Close gaps within scaffoldsSynopsisabyss-sealer -b <Bloom filter size> -k <kmer size> -k <kmer size>... -o <output_prefix> -S <path to scaffold file> [options]... <reads1> [reads2]...For example: abyss-sealer -b20G -k64 -k80 -k96 -k112 -k128 -o test -S scaffold.fa read1.fa read2.fa DescriptionSealer is an application of Konnector that closes intra-scaffold gaps. It performs three sequential functions. First, regions with Ns are identified from an input scaffold. Flanking nucleotues (2 x 100bp) are extracted from those regions while respecting the strand (5’ to 3’) direction on the sequence immediately downstream of each gap. In the second step, flanking sequence pairs are used as input to Konnector along with a set of reads with a high level of coverage redundancy. Ideally, the reads should represent the original dataset from which the draft assembly is generated, or further whole genome shotgun (WGS) sequencing data generated from the same sample. Within Konnector, the input WGS reads are used to populate a Bloom filter, tiling the reads with a sliding window of length k, thus generating a probabilistic representation of all the k-mers in the reads. Konnector also uses crude error removal and correctional algorithms, eliminating singletons (k-mers that are observed only once) and fixing base mismatches in the flanking sequence pairs. Sealer launches Konnector processes using a user-input range of k-mer lengths. In the third and final operation, successfully merged sequences are inserted into the gaps of the original scaffolds, and Sealer outputs a new gap-filled scaffold file.InstallationSee ABySS installation instructions.How to run as stand-alone applicationabyss-sealer [-b bloom filter size][-k values...] [-o outputprefix] [-S assembly file] [options...] [reads...]Sealer requires the following information to run: - draft assembly - user-supplied k values (>0) - output prefix - WGS reads (for building Bloom Filters) Sample commandsWithout pre-built bloom filters:abyss-sealer -b20G -k64 -k96 -o run1 -S test.fa read1.fq.gz read2.fq.gz With pre-built bloom filters: abyss-sealer -k64 -k96 -o run1 -S test.fa -i k64.bloom -i k96.bloom read1.fq.gz read2.fq.gz Reusable Bloom filters can be pre-built with abyss-bloom build, e.g.: abyss-bloom build -vv -k64 -j12 -b20G -l2 k64.bloom read1.fq.gz read2.fq.gz Note: when using pre-built bloom filters generated by abyss-bloom build, Sealer must be compiled with the same maxk value that abyss-bloom was compiled with. For example, if a Bloom filter was built with a maxk of 64, Sealer must be compiled with a maxk of 64 as well. If different values are used between the pre-built bloom filter and Sealer, any sequences generated will be nonsensical and incorrect. Output files
The log file contains results of each Konnector run. The structure of one run is as follows:
The scaffold.fa file is a gap-filled version of the draft assembly inserted into Sealer. The merged.fa file contains every newly generated sequence that were inserted into gaps, including the flanking sequences. Negative sizes of new sequences indicate Konnector collapsed the pair of flanking sequences. For example: >[scaffold ID]_[original start position of gap on scaffold]_[size of new sequence] ACGCGACGAGCAGCGAGCACGAGCAGCGACGAGCGACGACGAGCAGCGACGAGCG If --print-flanks option is enabled, Sealer outputs the flanking sequences used to insert into Konnector. This may be useful should users which to double check if this tool is extracting the correct sequences surrounding gaps. The structure of these files are as follows: >[scaffold ID]_[original start position of gap on scaffold]_[size of gap]/[1 or 2 indicating whether left or right flank] GCTAGCTAGCTAGCTGATCGATCGTAGCTAGCTAGCTGACTAGCTGATCAGTCGA How to optimize for gap closureTo optimize Sealer, users can observe the log files generated after a run and adjust parameters accordingly. If k runs are showing gaps having too many paths or branches, consider increasing -P or -B parameters, respectively.Also consider increasing the number of k values used. Generally, large k-mers are better able to address highly repetitive genomic regions, while smaller k-mers are better able to resolve areas of low coverage. Runtime and memory usageMore k values mean more bloom filters will be required, which will increase runtime as it takes time to build/load each bloom filter at the beginning of each k run. Memory usage is not affected by using more bloom filters.The larger value used for parameters such as -P, -B or -F will increase runtime. OptionsParameters of abyss-sealer
k is the size of k-mer for the de Bruijn graph. You may specify multiple values of k, which will increase the number of gaps closed at the cost of increased run time. Multiple values of k ought to be specified in increasing order, as lower values of k have fewer coverage gaps and are less likely to misassemble. P is the threshold for number of paths allowed to be traversed. When set to 10, Konnector will attempt to close gaps even when there are 10 different paths found. It would attempt to create a consensus sequence between these paths. The default setting is 2. AUTHORSDaniel Paulino.
Visit the GSP FreeBSD Man Page Interface. |