|
NAMEBio::Phylo::EvolutionaryModels - Evolutionary models for phylogenetic trees and methods to sample these Klaas Hartmann, September 2007SYNOPSIS#For convenience we import the sample routine (so we can write sample(...) instead of #Bio::Phylo::EvolutionaryModels::sample(...). use Bio::Phylo::EvolutionaryModels qw (sample); #Example#A###################################################################### #Simulate a single tree with ten species from the constant rate birth model with parameter 0.5 my $tree = Bio::Phylo::EvolutionaryModels::constant_rate_birth(birth_rate => .5, tree_size => 10); #Example#B###################################################################### #Sample 5 trees with ten species from the constant rate birth model using the b algorithm my ($sample,$stats) = sample(sample_size =>5, tree_size => 10, algorithm => 'b', algorithm_options => {rate => 1}, model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth, model_options => {birth_rate=>.5}); #Print a newick string for the 4th sampled tree print $sample->[3]->to_newick."\n"; #Example#C###################################################################### #Sample 5 trees with ten species from the constant rate birth and death model using #the bd algorithm and two threads (useful for dual core processors) #NB: we must specify an nstar here, an appropriate choice will depend on the birth_rate # and death_rate we are giving the model my ($sample,$stats) = sample(sample_size =>5, tree_size => 10, threads => 2, algorithm => 'bd', algorithm_options => {rate => 1, nstar => 30}, model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth_death, model_options => {birth_rate=>1,death_rate=>.8}); #Example#D###################################################################### #Sample 5 trees with ten species from the constant rate birth and death model using #incomplete taxon sampling # #sampling_probability is set so that the true tree has 10 species with 50% probability, #11 species with 30% probability and 12 species with 20% probability # #NB: we must specify an mstar here this will depend on the model parameters and the # incomplete taxon sampling parameters my $algorithm_options = {rate => 1, nstar => 30, mstar => 12, sampling_probability => [.5, .3, .2]}; my ($sample,$stats) = sample(sample_size =>5, tree_size => 10, algorithm => 'incomplete_sampling_bd', algorithm_options => $algorithm_options, model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth_death, model_options => {birth_rate=>1,death_rate=>.8}); #Example#E###################################################################### #Sample 5 trees with ten species from a Yule model using the memoryless_b algorithm #First we define the random function for the shortest pendant edge for a Yule model my $random_pendant_function = sub { %options = @_; return -log(rand)/$options{birth_rate}/$options{tree_size}; }; #Then we produce our sample my ($sample,$stats) = sample(sample_size =>5, tree_size => 10, algorithm => 'memoryless_b', algorithm_options => {pendant_dist => $random_pendant_function}, model => \&Bio::Phylo::EvolutionaryModels::constant_rate_birth, model_options => {birth_rate=>1}); #Example#F####################################################################### #Sample 5 trees with ten species from a constant birth death rate model using the #constant_rate_bd algorithm my ($sample) = sample(sample_size => 5, tree_size => 10, algorithm => 'constant_rate_bd', model_options => {birth_rate=>1,death_rate=>.8}); DESCRIPTIONThis package contains evolutionary models for phylogenetic trees and algorithms for sampling from these models. It is a non-OO module that optionally exports the 'sample', 'constant_rate_birth' and 'constant_rate_birth_death' subroutines into the caller's namespace, using the "use Bio::Phylo::EvolutionaryModels qw(sample constant_rate_birth constant_rate_birth_death);" directive. Alternatively, you can call the subroutines as class methods, as in the synopsis.The initial set of algorithms available in this package corresponds to those in: Sampling trees from evolutionary models Klaas Hartmann, Dennis Wong, Tanja Gernhard Systematic Biology, in press Some comments and code refers back to this paper. Further algorithms and evolutionary are encouraged and welcome. To make this code as straightforward as possible to read some of the algorithms have been implemented in a less than optimal manner. The code also follows the structure of an earlier version of the manuscript so there is some redundancy (eg. the birth algorithm is just a specific instance of the birth_death algorithm) SAMPLINGAll sampling algorithms should be accessed through the generic sample interface.Generic sampling interface: sample()Type : Interface Title : sample Usage : see SYNOPSIS Function: Samples phylogenetic trees from an evolutionary model Returns : A sample of phylogenetic trees and statistics from the sampling algorithm Args : Sampling parameters in a hash This method acts as a gateway to the various sampling algorithms. The argument is a single hash containing the options for the sampling run. Sampling parameters (* denotes optional parameters): sample_size The number of trees to return (more trees may be returned) tree_size The size that returned trees should be model The evolutionary model (should be a function reference) model_options A hash pointer for model options (see individual models) algorithm The algorithm to use (omit the preceding sample_) algorithm_options A hash pointer for options for the algorithm (see individual algorithms for details) threads* The number of threads to use (default is 1) output_format* Set to newick for newick trees (default is Bio::Phylo::Forest::Tree) remove_extinct Set to true to remove extinct species Available algorithms (algorithm names in the paper are given in brackets): b For all pure birth models (simplified GSA) bd For all birth and death models (GSA) incomplete_sampling_bd As above, with incomplete taxon sampling (extended GSA) memoryless_b For memoryless pure birth models (PBMSA) constant_rate_bd For birth and death models with constant rates (BDSA) Model If you create your own model it must accept an options hash as its input. This options hash can contain any parameters you desire. Your model should simulate a tree until it becomes extinct or the size/age limit as specified in the options has been reached. Respectively these options are tree_size and tree_age. Multi-threading Multi-thread support is very simplistic. The number of threads you specify are created and each is assigned the task of finding sample_size/threads samples. I had problems with using Bio::Phylo::Forest::Tree in a multi- threaded setting. Hence the sampled trees are returned as newick strings to the main routine where (if required) Tree objects are recreated from the strings. For most applications this overhead seems negligible in contrast to the sampling times. From a code perspective this function (sample): Checks input arguments Handles multi-threading Calls the individual algorithms to perform sampling Reformats data Sampling algorithmsThese algorithms should be accessed through the sampling interface (sample()). Additional parameters need to be passed to these algorithms as described for each algorithm.
EVOLUTIONARY MODELSAll evolutionary models take a options hash as their input argument and return a Bio::Phylo::Forest::Tree. This tree may contain extinct lineages (lineages that end prior to the end of the tree).The options hash contains any model specific parameters (see the individual model descriptions) and one or both terminating conditions: tree_size => the number of extant species at which to terminate the tree tree_age => the age of the tree at which to terminate the process Note that if the model stops due to the tree_size condition then the tree ends immediately after the speciation event that created the last species.
Visit the GSP FreeBSD Man Page Interface. |