GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
RSS(1) FreeBSD General Commands Manual RSS(1)

prss - test a protein sequence similarity for significance

prss [-Q -f # -g # -h -O file -s SMATRIX -w # ] sequence-file-1 sequence-file-2 [ #-of-shuffles ]

prss [-fghsw] - interactive mode

prss is used to evaluate the significance of a protein sequence similarity score by comparing two sequences and calculating optimal similarity scores, and then repeatedly shuffling the second sequence, and calculating optimal similarity scores using the Smith-Waterman algorithm. An extreme value distribution is then fit to the shuffled-sequence scores. The characteristic parameters of the extreme value distribution are then used to estimate the probability that each of the unshuffled sequence scores would be obtained by chance in one sequence, or in a number of sequences equal to the number of shuffles. This program is derived from rdf2, which was described by Pearson and Lipman, PNAS (1988) 85:2444-2448, and Pearson (Meth. Enz. 183:63-98). Use of the extreme value distribution for estimating the probabilities of similarity scores was described by Altshul and Karlin, PNAS (1990) 87:2264-2268. The 'z-values' calculated by rdf2 are not as informative as the P-values and expectations calculated by prdf. prss uses calculates optimal scores using the same rigorous Smith-Waterman algorithm (Smith and Waterman, J. Mol. Biol. (1983) 147:195-197) used by the ssearch program.

prss also allows a more sophisticated shuffling method: residues can be shuffled within a local window, so that the order of residues 1-10, 11-20, etc, is destroyed but a residue in the first 10 is never swapped with a residue outside the first ten, and so on for each local window.

(1)
prss -w 10 musplfm.aa lcbo.aa

Compare the amino acid sequence in the file musplfm.aa with that in lcbo.aa, then shuffle lcbo.aa 100 times using a local shuffle with a window of 10. Report the significance of the unshuffled musplfm/lcbo comparison scores with respect to the shuffled scores.

(2)
prss musplfm.aa lcbo.aa

Compare the amino acid sequence in the file musplfm.aa with the sequences in the file lcbo.aa.

(3)
prss

Run prss in interactive mode. The program will prompt for the file name of the two query sequence files and the number of shuffles to be used. 100 shuffles are calculated by default; 250 - 500 shuffles should provide more accurate probability estimates.

prss can be directed to change the scoring matrix, gap penalties, and shuffle parameters by entering options on the command line (preceeded by a `-'). All of the options should preceed the file names number of shuffles.
-f #
Penalty for the first residue in a gap (-12 by default).
-g #
Penalty for additional residues in a gap (-2 by default).
-h
Do not display histogram of similarity scores.
-Q -q
"quiet" - do not prompt for filename.
-O filename
send copy of results to "filename."
-s str
(SMATRIX) the filename of an alternative scoring matrix file. For protein sequences, BLOSUM50 is used by default; PAM250 can be used with the command line option -s 250(or with -s pam250.mat).

ssearch(1), prdf(1), fasta(1), lfasta(1), protcodes(5)

Bill Pearson
wrp@virginia.EDU

The curve fitting routines in rweibull.c were provided by Phil Green, Washington U., St. Louis.

local

Search for    or go to Top of page |  Section 1 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.