srf2fastq - Converts SRF files to Sanger fastq format
srf2fastq [options] srf_archive ...
srf2fastq extracts sequences and qualities from one or more SRF archives
and writes them in Sanger fastq format to stdout.
Note that Illumina also have a fastq format (used in the GERALD
directories) which differs slightly in the use of log-odds scores for the
quality values. The format described here is using the traditional
Phred style of quality encoding.
- -c
- Outputs calibrated confidence values using the ZTR CNF1 chunk type
for a single quality per base. Without this use the original Illumina
_prb.txt files consisting of four quality values per base, stored
in the ZTR CNF4 chunks.
- -C
- Masks out sequences tagged as bad quality.
- -s root
- Generates files on disk with filenames starting root, one file per
non-explicit element in the SRF/ZTR region (REGN) chunk. Typically this
results in two files for paired end runs. The filename suffixes come from
the names listed in the SRF region chunks. This option conflicts with the
-S parameter.
- -S
- Splits sequences into regions, but sequentially lists each sequence region
to stdout instead of splitting to separate files on disk. This option
conflicts with the -s parameter.
- -n
- When using -s the filename suffixes are simply numbered (starting with 1)
instead of using the names listed in the SRF region chunks.
- -a
- Appends region index to the sequence names. Ie generate "name/1"
and "name/2" for a paired read.
- -e
- Include any explicit sequence (ZTR region chunk of type 'E') in the
sequence output. The explicit sequence is also included in the quality
line too. Currently this is utilised by ABI SOLiD to store the last base
of the primer.
- -r region list
- Reverse complements the sequence and reverses the quality values for all
regions in the region list. This is a comma separated list of
integer values enumerating the regions, starting from 1. Note that this
option only works when either -s or -S are specified.
To extract only the good quality sequences from all srf files in the current
directory using calibrated confidence values (if available).
srf2fastq -c -C *.srf > runX.fastq
To extract a paired end run into two separate files with sequences
named name/1 and name/2.
srf2fastq -s runX -a -n runX.srf
To extract a paired end run as a single file, alternating forward
and reverse sequences, with the second read being reverse complemented.
srf2fastq -S -r 2 runX.srf > runX.fastq
James Bonfield, Steven Leonard - Wellcome Trust Sanger Institute