NAME

MIGRATE - estimate population parameters: migration rate and population size

SYNOPSIS

migrate-n

DESCRIPTION

Migrate estimates population parameters (effective population size and migration rates) using genetic data (Electrophoretic markers, microsatellite markers, sequence data, and single nucleotide polymorphism data). It is a maximum likelihood estimator or Bayesian estimator and uses a coalescent theory approach taking into account history of mutations and uncertainty of the genealogy.

or get a copy of the manual in PDF format from http://popgen.scs.fsu.edu

OPTIONS

there are no options on the commandline, but you can specify the options in a parmfile or in the menu

PARMFILE OPTIONS

The parmfile options are split into Datatype, Input/Output, Start parameters, Search strategy

DATATYPE

datatype=<Allele | Microsatellites | Brownian | Sequences | Nucleotide-polymorphisms | Panel-SNP | Genealogies >: specifies the datatype used for the analyses, needless to say that if you have the wrong data for the chosen type the program will crash. Allele: infinite allele model, suitable for electrophoretic markers, perhaps the "best" guess for codominant markers of which we do not know the mutation model. Microsatellite: a simple electrophoretic ladder model is used for the change along the branches in genealogy. Brownian: a Brownian motion approximation to the stepwise mutation model for microsatellites us used (this is MUCH faster than exact model, but is not a good approximation if population sizes are small (say below 10). Sequences: Data are DNA or RNA sequences and the mutation model used is F84, first used by Felsenstein 1984 (actually the same as in dnaml (Phylip version 3.5), a description of this model can be found in Swofford et al. 1996. Nucleotide-polymorphism: [SNP] the data likelihood is corrected for sampling only variable sites. We assume that the data was used to find the SNP. Panel-SNP: the data likelihood is corrected for using a panel of SNP sites, that were polymorphic. The panel has to be population 1. Genealogies: Reads the sumfile of a previous run, with this options the genealogy sampling step will not be done and the genealogies provided in the sumfile are analyzed. This datatype makes it easy to rerun the program for different likelihood ratio test or different settings for the profile likelihood printouts.

Sequence data specific options

freq-from-data=< Yes | No:freqA freqG freqC freqT>

ttratio=< r1 r2 .....>

interleaved=<Yes | No >

categories=<Yes | No>

If you specify Yes you need a file named catfile in the same directory with the following Syntax: number_of_categories cat1 cat2 cat3 .. categorylabel_for_each_site for each locus, a # in the first column can be used to start a comment-line. Example is for a data set with 2 loci and 20 base pairs each # Example catfile for two loci # in migrate you can use # as comments 2 1 10 11111111112222222222 5 0.1 2 5 23 3 11111122223333445555

rates=< n : r1 r2 r3 ..rn>

prob-rates=< n : p1 p2 p3 ... pn>

autocorrelation=<Yes:value | No>

weights=<Yes | No>

If you specify Yes you need a file weightfile with weights for each site, the weights can be the following numbers 0-9 and letters A-Z, so you have 35 possible weights available. # Example weightfile for two loci 11111111112222222222 1111112222AAAA445XXXX5

distfile=<Yes | No>

You can supply a distance file for each locus (using PHYLIP syntax). The sequence of indiviudals must be same as in the infile. This option appears in the menu when you choose

0 Start genealogy is estimated using a UPGMA topology

The distance file is then used to create an UPGMA tree with a minimal number of migration events. For large trees this is options help to get better starting trees than the automatic tree generation which uses a rather unsophisticated distance method (differences).

usertree=<Yes | No>

If you specify Yes you need a file intree. In this file you have starting trees for each locus. BUT these trees need to have migration events in them!

Microsatellite data

micro-threshold=value: specifies the window in which probabilities of change are calculated if we have allele 34 then only probabilities of a change from 34 to 35-44 and 24-34 are considered, the higher this value is the longer you wait for your result, choosing it too small will produce wrong results. Default is micro-threshold=10

Electrophoretic data

No special variables.

Nucleotide polymorphism

Similar to sequence data.

INPUT/OUTPUT

infile=filename: Default is infile
random-seed=<Auto | Noauto | Own:seedvalue>: The random number seed guarantees that you can reproduce a run exactly. Good random number seeds are (values * 4) + 1. If you do not specify the random number seed ( seed=Auto ) the program will use the system clock. With seed=Noauto the program expects to find a file named seedfile with the random number seed. With random-seed=Own:seedvalue you can specify the seed value in the parmfile (or in the menu).
title=titletext
progress=<Yes|No|Verbose>: The default is progress=Yes
outfile=filename: The default is obviously outfile=outfile
print-data=<Yes|No>: Print the data in the outfile. Default is print-data=No.
print-fst=<Yes|No>: Print a table of an FST estimate for comparison (Beerli and Felsenstein 1999, Beerli 1998) [not recommended].
plot=<No | Yes>[:<Outfile|Both>[:<std|log>:{mig-axis-start,mig-axis-end,theta-axis-start,theta-axis-end}<:printpos<M | Nm>>]]: If plot=No then no plot of the parameter space is shown in the outfile, if Yes then you can specify whether you want to have the accurate numbers in a separate file ( mathfile ) using printpos "pixel" in each direction,or only the ASCII-graphics plot in the outfile. The last option ( M or N )let you define whether you want the plot in M=m/mu or (default) 4Nm units. Default is plot=Yes:Outfile. Example of a more complicated statement: plot=Yes:Both:std:0,10,0,0.025:100N For syntax in mathfile see documentation
profile=<No|Yes<:<Fast|Percentile|Spline|Discrete|Quick >><:M | Nm >: Print profile likelihood. See section Likelihood ratio tests and profile likelihood. Default is profile=Yes:Fast:N.
l-ratio=<None | <Mean|Loci>:testparam> (N-POP): Likelihood ratio tests. See section Likelihood ratio tests and profile likelihood. Default is l-ratio=None.
print-trees=<All | None | Last | Best>: Default is print-trees=None
mathfile=filename
sumfile=<No | Yes | Yes:filename >: Intermediate results of the genealogy sampling process are save into a file named sumfile or into the file for that you specify the filename. You can use this sumfile to rerun the program for further analysis, e.g. calculating likelihood ratios or profile likelihoods, see datatype=Genealogy.

START VALUES FOR THE PARAMETERS

theta=<Fst | Own:{value1,value2 ,...}>

With Fst the programs tries to use an FST based measure (Maynard Smith 1970, Nei and Feldman 1972) Own: { value1, value2, ... } defines arbitrary start values.

migration=<Fst|Own:Migration matrix > (N-POP)

The migration matrix is a n by n table with - on the diagonal and can look like this for four populations migration=OWN:{ - 1.0 1.1 1.2 0.9 - 0.8 0.7 2.1 2.2 - 2.3 1.4 1.5 1.6 - } or like this migration=OWN:{ - 1.0 1.1 1.2 0.9 - 0.8 0.7 2.1 2.2 - 2.3 1.4 1.5 1.6 - }

mutation=<Gamma | NoGamma>

The default is mutation=Nogamma

fst-type=<Theta | Migration >

custom-migration=< NONE|migration - matrix >

The migration matrix contains the migration rates from j to i on row i, and the are on the diagonal. The migration matrix can consist of connections that are *: no restriction

0: not estimated

m: mean value of either 4Nm or M.

s: symmetric migration [only for M]

c: constant value (together with migration=OWN.. or theta=OWN..)

The values can be spaced by blanks, newlines. A few examples for 4 populations:

Full model: custom-migration={**** **** **** ****}

N-island model: custom-migration={m m m m mm mm m mmm mmmm}

Stepping Stone model: with symmetric migrations, and unrestricted estimates: custom-migration={*s00 s*s0 0s*s 00s*}

Source-Sink: (the first population is the source): custom-migration={*000**000**0*000}

SEARCH STRATEGY

Please read the documentation ,these settings are important and will influence the accuracy of your results.

short-chains=value: Default is 10.
short-inc=value: Default is 20.
short-sample=value: Default is 500.
long-chains=value: Default is 2.
long-inc=value: Default is 20.
long-sample=value: Default is 5000.
burn-in=value: Default is 10000.
replicate=<NO | YES<:LONGCHAINS | number>>
heating=<NO | YES<:{1,1.1,1.2,1.3}>>

Obscure options

see documentation

BUGS

This man page is not up to date and misses the Bayesian inference section, but see documentation.

MAIN DISTRIBUTION WEBSITE

http://popgen.csit.fsu.edu

AUTHOR

Peter Beerli <beerli@csit.fsu.edu>

[if you use this man page, please let me know]