|
NAMEsamtools fasta / fastq - converts a SAM/BAM/CRAM file to FASTA or FASTQSYNOPSISsamtools fastq [options] in.bamsamtools fasta [options] in.bam DESCRIPTIONConverts a BAM or CRAM into either FASTQ or FASTA format depending on the command invoked. The files will be automatically compressed if the file names have a .gz or .bgzf extension.If the input contains read-pairs which are to be interleaved or written to separate files in the same order, then the input should be first collated by name. Use samtools collate or samtools sort -n to ensure this. For each different QNAME, the input records are categorised according to the state of the READ1 and READ2 flag bits. The three categories used are: 1 : Only READ1 is set. 2 : Only READ2 is set. 0 : Either both READ1 and READ2 are set; or neither is set. The exact meaning of these categories depends on the sequencing technology used. It is expected that ordinary single and paired-end sequencing reads will be in categories 1 and 2 (in the case of paired-end reads, one read of the pair will be in category 1, the other in category 2). Category 0 is essentially a “catch-all” for reads that do not fit into a simple paired-end sequencing model. For each category only one sequence will be written for a given QNAME. If more than one record is available for a given QNAME and category, the first in input file order that has quality values will be used. If none of the candidate records has quality values, then the first in input file order will be used instead. Sequences will be written to standard output unless one of the -1, -2, -o, or -0 options is used, in which case sequences for that category will be written to the specified file. The same filename may be specified with multiple options, in which case the sequences will be multiplexed in order of occurrence. If a singleton file is specified using the -s option then only paired sequences will be output for categories 1 and 2; paired meaning that for a given QNAME there are sequences for both category 1 and 2. If there is a sequence for only one of categories 1 or 2 then it will be diverted into the specified singletons file. This can be used to prepare fastq files for programs that cannot handle a mixture of paired and singleton reads. The -s option only affects category 1 and 2 records. The output for category 0 will be the same irrespective of the use of this option. OPTIONS
EXAMPLESStarting from a coordinate sorted file, output paired reads to separate files, discarding singletons, supplementary and secondary reads. The resulting files can be used with, for example, the bwa aligner.
samtools collate -u -O in_pos.bam | \ samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n Starting with a name collated file, output paired and singleton reads in a single file, discarding supplementary and secondary reads. To get all of the reads in a single file, it is necessary to redirect the output of samtools fastq. The output file is suitable for use with bwa mem -p which understands interleaved files containing a mixture of paired and singleton reads.
samtools fastq -0 /dev/null in_name.bam > all_reads.fq
Output paired reads in a single file, discarding supplementary and secondary reads. Save any singletons in a separate file. Append /1 and /2 to read names. This format is suitable for use by NextGenMap when using its -p and -q options. With this aligner, paired reads must be mapped separately to the singletons.
samtools fastq -0 /dev/null -s single.fq -N in_name.bam > paired.fq
BUGS
AUTHORWritten by Heng Li, with modifications by Martin Pollard and Jennifer Liddle, all from the Sanger Institute.SEE ALSOsamtools(1), samtools-faidx(1), samtools-fqidx(1) samtools-import(1)Samtools website: <http://www.htslib.org/>
Visit the GSP FreeBSD Man Page Interface. |