|
|
| |
SSEARCH(1) |
FreeBSD General Commands Manual |
SSEARCH(1) |
ssearch - scan a protein or DNA sequence library for similar sequences
ssearch [-a -b # -d # -E # -f # -g # -h -i -l FASTLIBS -L -r
STATFILE -m # -O filename -Q -s SMATRIX -w # -z ]
query-sequence-file library-file
ssearch [-QabdEfghilmOrswz] query-file
@library-name-file
ssearch [-QabdEfghilmOrswz] query-file
"%PRMVI"
ssearch [-aEfghilmrsw] - interactive mode
ssearch compares a protein or DNA sequence to all of the entries in a
sequence library using the rigorous Smith-Waterman algorithm (Smith and
Waterman, J. Mol. Biol. (1983) 147:195-197. For example, ssearch can
compare a protein sequence to all of the sequences in the NBRF PIR protein
sequence database. ssearch will automatically decide whether the query
sequence is DNA or protein by reading the query sequence as protein and
determining whether the `amino-acid composition' is more than 85% A+C+G+T. The
program can be invoked either with command line arguments or in interactive
mode. ssearch compares a query sequence to a sequence library which
consists of sequence data interspersed with comments, see below. The
fasta programs, including ssearch, use a standard text format
sequence file. Lines beginning with or lower case, blanks,tabs and
unrecognizable characters are ignored. ssearch expects sequences to use
the single letter amino acid codes, see protcodes(1) . Library files
for ssearch should have the form shown below.
ssearch can be directed to change the scoring matrix, search parameters,
output format, and default search directories by entering options on the
command line (preceeded by a `-'). All of the options should preceed the file
name and ktup arguments). Alternately, these options can be changed by setting
environment variables. The options and environment variables are:
- -a
- (SHOWALL) Modifies the display of the two sequences in alignments.
Normally, both sequences are shown only where they overlap (SHOWALL=0); If
-a or the environment variable SHOWALL = 1, both sequences are shown in
their entirety.
- -b #
- The number of similarity scores to be shown when the -Q option is
used. This value is usually calculated based on the actual scores.
- -d #
- The number of alignments to be shown. Normally, ssearch shows the
same number of alignments as similarity scores. By using ssearch -Q
-b 200 -d 50, one would see the top scoring 200 sequences and
alignments for the 50 best scores.
- -E #
- The expectation value threshold for displaying similarity scores and
sequence alignments. fasta -Q -E 2.0 would show all library
sequences with scores expected to occur no more than 2 times by chance in
a search of the library.
- -f #
- Penalty for the first residue in a gap (-12 by default).
- -g #
- Penalty for additional residues in a gap (-2 by default).
- -h
- Do not display histogram of similarity scores.
- -l file
- (FASTLIBS) The name of the library menu file. Normally this will be
determined by the environment variable FASTLIBS. However, a library
menu file can also be specified with -l.
- -L
- display more information about the library sequence in the alignment.
- -m #
- (MARKX) =0,1,2,3. Alternate display of matches and mismatches in
alignments. MARKX=0 uses ":","."," ", for
identities, consevative replacements, and non-conservative replacements,
respectively. MARKX=1 uses " ","x", and "X".
MARKX=2 does not show the second sequence, but uses the second alignment
line to display matches with a "." for identity, or with the
mismatched residue for mismatches. MARKX=2 is useful for aligning large
numbers of similar sequences. MARKX=3 writes out a file of library
sequences in FASTA format. MARKX=3 should always be used with the
"SHOWALL" (-a) option, but this does not completely ensure that
all of the sequences output will be aligned.
- -O filename
- Sends copy of results to "filename".
- -Q Quiet option. This allows ssearch to search a database and report
- the results without asking any questions. ssearch -Q file library >
output can be put in the background or run at a later time with the unix
'at' command. The number of similarity scores and alignments displayed
with the -Q option can be modified with the -b (scores) and
-d (alignments) options.
- -r
- STATFILE Causes ssearch to write out the sequence
identifier, superfamily number (if available), and similarity scores to
STATFILE for every sequence in the library. These results are not
sorted.
- -s str
- (SMATRIX) the filename of an alternative scoring matrix file. For
protein sequences, BLOSUM50 is used by default; PAM250 can be used with
the command line option -s 250.
- -w #
- (LINLEN) output line length for sequence alignments. (normally 60,
can be set up to 200).
- -z
- Do not do statistical significance calculation.
- (1)
- ssearch musplfm.aa $AABANK
Compare the amino acid sequence in the file musplfm.aa with the
complete PIR protein sequence library. This is extremely slow and should
almost never be done. ssearch is designed to search very small
libraries of sequences.
>LCBO bovine preprolactin
WILLLSQ ...
>LCHU human ...
...
- (2)
- ssearch -a -w 80 musplfm.aa lcbo.aa
Compare the amino acid sequence in the file musplfm.aa with the
sequences in the file lcbo.aa using ktup = 1. Show both sequences in
their entirety, with 80 residues on each output line.
- (3)
- ssearch
Run the ssearch program in interactive mode. The program
will prompt for the file name for the query sequence, list alternative
libraries to be seached (if FASTLIBS is set), and prompt for the
ktup.
You can use your own sequence files for ssearch, just be
certain to put a '>' and comment as the first line before the
sequence.
rss(1), align(1), fasta(1), rdf2(1),protcodes(5), dnacodes(5)
Bill Pearson
wrp@virginia.EDU
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |