|
|
| |
Bio::Phylo::Matrices::MatrixRole(3) |
User Contributed Perl Documentation |
Bio::Phylo::Matrices::MatrixRole(3) |
Bio::Phylo::Matrices::MatrixRole - Extra behaviours for a character state matrix
use Bio::Phylo::Factory;
my $fac = Bio::Phylo::Factory->new;
# instantiate taxa object
my $taxa = $fac->create_taxa;
for ( 'Homo sapiens', 'Pan paniscus', 'Pan troglodytes' ) {
$taxa->insert( $fac->create_taxon( '-name' => $_ ) );
}
# instantiate matrix object, 'standard' data type. All categorical
# data types follow semantics like this, though with different
# symbols in lookup table and matrix
my $standard_matrix = $fac->create_matrix(
'-type' => 'STANDARD',
'-taxa' => $taxa,
'-lookup' => {
'-' => [],
'0' => [ '0' ],
'1' => [ '1' ],
'?' => [ '0', '1' ],
},
'-labels' => [ 'Opposable big toes', 'Opposable thumbs', 'Not a pygmy' ],
'-matrix' => [
[ 'Homo sapiens' => '0', '1', '1' ],
[ 'Pan paniscus' => '1', '1', '0' ],
[ 'Pan troglodytes' => '1', '1', '1' ],
],
);
# note: complicated constructor for mixed data!
my $mixed_matrix = Bio::Phylo::Matrices::Matrix->new(
# if you want to create 'mixed', value for '-type' is array ref...
'-type' => [
# ...with first field 'mixed'...
'mixed',
# ...second field is an array ref...
[
# ...with _ordered_ key/value pairs...
'dna' => 10, # value is length of type range
'standard' => 10, # value is length of type range
# ... or, more complicated, value is a hash ref...
'rna' => {
'-length' => 10, # value is length of type range
# ...value for '-args' is an array ref with args
# as can be passed to 'unmixed' datatype constructors,
# for example, here we modify the lookup table for
# rna to allow both 'U' (default) and 'T'
'-args' => [
'-lookup' => {
'A' => [ 'A' ],
'C' => [ 'C' ],
'G' => [ 'G' ],
'U' => [ 'U' ],
'T' => [ 'T' ],
'M' => [ 'A', 'C' ],
'R' => [ 'A', 'G' ],
'S' => [ 'C', 'G' ],
'W' => [ 'A', 'U', 'T' ],
'Y' => [ 'C', 'U', 'T' ],
'K' => [ 'G', 'U', 'T' ],
'V' => [ 'A', 'C', 'G' ],
'H' => [ 'A', 'C', 'U', 'T' ],
'D' => [ 'A', 'G', 'U', 'T' ],
'B' => [ 'C', 'G', 'U', 'T' ],
'X' => [ 'G', 'A', 'U', 'T', 'C' ],
'N' => [ 'G', 'A', 'U', 'T', 'C' ],
},
],
},
],
],
);
# prints 'mixed(Dna:1-10, Standard:11-20, Rna:21-30)'
print $mixed_matrix->get_type;
This module defines a container object that holds Bio::Phylo::Matrices::Datum
objects. The matrix object inherits from Bio::Phylo::Listable, so the methods
defined there apply here.
- new()
- Matrix constructor.
Type : Constructor
Title : new
Usage : my $matrix = Bio::Phylo::Matrices::Matrix->new;
Function: Instantiates a Bio::Phylo::Matrices::Matrix
object.
Returns : A Bio::Phylo::Matrices::Matrix object.
Args : -type => optional, but if used must be FIRST argument,
defines datatype, one of dna|rna|protein|
continuous|standard|restriction|[ mixed => [] ]
-taxa => optional, link to taxa object
-lookup => character state lookup hash ref
-labels => array ref of character labels
-matrix => two-dimensional array, first element of every
row is label, subsequent are characters
- new_from_bioperl()
- Matrix constructor from Bio::Align::AlignI argument.
Type : Constructor
Title : new_from_bioperl
Usage : my $matrix =
Bio::Phylo::Matrices::Matrix->new_from_bioperl(
$aln
);
Function: Instantiates a
Bio::Phylo::Matrices::Matrix object.
Returns : A Bio::Phylo::Matrices::Matrix object.
Args : An alignment that implements Bio::Align::AlignI
- set_special_symbols
- Sets three special symbols in one call
Type : Mutator
Title : set_special_symbols
Usage : $matrix->set_special_symbols(
-missing => '?',
-gap => '-',
-matchchar => '.'
);
Function: Assigns state labels.
Returns : $self
Args : Three args (with distinct $x, $y and $z):
-missing => $x,
-gap => $y,
-matchchar => $z
Notes : This method is here to ensure
you don't accidentally use the
same symbol for missing AND gap
- set_charlabels()
- Sets argument character labels.
Type : Mutator
Title : set_charlabels
Usage : $matrix->set_charlabels( [ 'char1', 'char2', 'char3' ] );
Function: Assigns character labels.
Returns : $self
Args : ARRAY, or nothing (to reset);
- set_raw()
- Set contents using two-dimensional array argument.
Type : Mutator
Title : set_raw
Usage : $matrix->set_raw( [ [ 'taxon1' => 'acgt' ], [ 'taxon2' => 'acgt' ] ] );
Function: Syntax sugar to define $matrix data contents.
Returns : $self
Args : A two-dimensional array; first dimension contains matrix rows,
second dimension contains taxon name / character string pair.
- get_special_symbols()
- Retrieves hash ref for missing, gap and matchchar symbols
Type : Accessor
Title : get_special_symbols
Usage : my %syms = %{ $matrix->get_special_symbols };
Function: Retrieves special symbols
Returns : HASH ref, e.g. { -missing => '?', -gap => '-', -matchchar => '.' }
Args : None.
- get_charlabels()
- Retrieves character labels.
Type : Accessor
Title : get_charlabels
Usage : my @charlabels = @{ $matrix->get_charlabels };
Function: Retrieves character labels.
Returns : ARRAY
Args : None.
- get_nchar()
- Calculates number of characters.
Type : Accessor
Title : get_nchar
Usage : my $nchar = $matrix->get_nchar;
Function: Calculates number of characters (columns) in matrix (if the matrix
is non-rectangular, returns the length of the longest row).
Returns : INT
Args : none
- get_ntax()
- Calculates number of taxa (rows) in matrix.
Type : Accessor
Title : get_ntax
Usage : my $ntax = $matrix->get_ntax;
Function: Calculates number of taxa (rows) in matrix
Returns : INT
Args : none
- get_raw()
- Retrieves a 'raw' (two-dimensional array) representation of the matrix's
contents.
Type : Accessor
Title : get_raw
Usage : my $rawmatrix = $matrix->get_raw;
Function: Retrieves a 'raw' (two-dimensional array) representation
of the matrix's contents.
Returns : A two-dimensional array; first dimension contains matrix rows,
second dimension contains taxon name and characters.
Args : NONE
- get_ungapped_columns()
-
Type : Accessor
Title : get_ungapped_columns
Usage : my @ungapped = @{ $matrix->get_ungapped_columns };
Function: Retrieves the zero-based column indices of columns without gaps
Returns : An array reference with zero or more indices (i.e. integers)
Args : NONE
- get_invariant_columns()
-
Type : Accessor
Title : get_invariant_columns
Usage : my @invariant = @{ $matrix->get_invariant_columns };
Function: Retrieves the zero-based column indices of invariant columns
Returns : An array reference with zero or more indices (i.e. integers)
Args : Optional:
-gap => if true, counts the gap symbol (probably '-') as a variant
-missing => if true, counts the missing symbol (probably '?') as a variant
- calc_indel_sizes()
- Calculates size distribution of insertions or deletions
Type : Calculation
Title : calc_indel_sizes
Usage : my %sizes = %{ $matrix->calc_indel_sizes };
Function: Calculates the size distribution of indels.
Returns : HASH
Args : Optional:
-trim => if true, disregards indels at start and end
-insertions => if true, counts insertions, if false, counts deletions
- calc_prop_invar()
- Calculates proportion of invariant sites.
Type : Calculation
Title : calc_prop_invar
Usage : my $pinvar = $matrix->calc_prop_invar;
Function: Calculates proportion of invariant sites.
Returns : Scalar: a number
Args : Optional:
# if true, counts missing (usually the '?' symbol) as a state
# in the final tallies. Otherwise, missing states are ignored
-missing => 1
# if true, counts gaps (usually the '-' symbol) as a state
# in the final tallies. Otherwise, gap states are ignored
-gap => 1
- calc_state_counts()
- Calculates occurrences of states.
Type : Calculation
Title : calc_state_counts
Usage : my %counts = %{ $matrix->calc_state_counts };
Function: Calculates occurrences of states.
Returns : Hashref: keys are states, values are counts
Args : Optional - one or more states to focus on
- calc_state_frequencies()
- Calculates the frequencies of the states observed in the matrix.
Type : Calculation
Title : calc_state_frequencies
Usage : my %freq = %{ $object->calc_state_frequencies() };
Function: Calculates state frequencies
Returns : A hash, keys are state symbols, values are frequencies
Args : Optional:
# if true, counts missing (usually the '?' symbol) as a state
# in the final tallies. Otherwise, missing states are ignored
-missing => 1
# if true, counts gaps (usually the '-' symbol) as a state
# in the final tallies. Otherwise, gap states are ignored
-gap => 1
Comments: Throws exception if matrix holds continuous values
- calc_distinct_site_patterns()
- Identifies the distinct distributions of states for all characters and
counts their occurrences. Returns an array-of-arrays, where the first cell
of each inner array holds the occurrence count, the second cell holds the
pattern, i.e. an array of states. For example, for a matrix like this:
taxon1 GTGTGTGTGTGTGTGTGTGTGTG
taxon2 AGAGAGAGAGAGAGAGAGAGAGA
taxon3 TCTCTCTCTCTCTCTCTCTCTCT
taxon4 TCTCTCTCTCTCTCTCTCTCTCT
taxon5 AAAAAAAAAAAAAAAAAAAAAAA
taxon6 CGCGCGCGCGCGCGCGCGCGCGC
taxon7 AAAAAAAAAAAAAAAAAAAAAAA
The following data structure will be returned:
[
[ 12, [ 'G', 'A', 'T', 'T', 'A', 'C', 'A' ] ],
[ 11, [ 'T', 'G', 'C', 'C', 'A', 'G', 'A' ] ]
]
The patterns are sorted from most to least frequently
occurring, the states for each pattern are in the order of the rows in
the matrix. (In other words, the original matrix can more or less be
reconstructed by inverting the patterns, and multiplying them by their
occurrence, although the order of the columns will be lost.)
Type : Calculation
Title : calc_distinct_site_patterns
Usage : my $patterns = $object->calc_distinct_site_patterns;
Function: Calculates distinct site patterns.
Returns : A multidimensional array, see above.
Args : NONE
Comments:
- calc_gc_content()
- Calculates the G+C content as a fraction on the total
Type : Calculation
Title : calc_gc_content
Usage : my $fraction = $obj->calc_gc_content;
Function: Calculates G+C content
Returns : A number between 0 and 1 (inclusive)
Args : Optional:
# if true, counts missing (usually the '?' symbol) as a state
# in the final tallies. Otherwise, missing states are ignored
-missing => 1
# if true, counts gaps (usually the '-' symbol) as a state
# in the final tallies. Otherwise, gap states are ignored
-gap => 1
Comments: Throws 'BadArgs' exception if matrix holds anything other than DNA
or RNA. The calculation also takes the IUPAC symbol S (which is C|G)
into account, but no other symbols (such as V, for A|C|G);
- calc_median_sequence()
- Calculates the median character sequence of the matrix
Type : Calculation
Title : calc_median_sequence
Usage : my $seq = $obj->calc_median_sequence;
Function: Calculates median sequence
Returns : Array in list context, string in scalar context
Args : Optional:
-ambig => if true, uses ambiguity codes to summarize equally frequent
states for a given character. Otherwise picks a random one.
-missing => if true, keeps the missing symbol (probably '?') if this
is the most frequent for a given character. Otherwise strips it.
-gaps => if true, keeps the gap symbol (probably '-') if this is the most
frequent for a given character. Otherwise strips it.
Comments: The intent of this method is to provide a crude approximation of the most
commonly occurring sequences in an alignment, for example as a starting
sequence for a sequence simulator. This gives you something to work with if
ancestral sequence calculation is too computationally intensive and/or not
really necessary.
- keep_chars()
- Creates a cloned matrix that only keeps the characters at the supplied
(zero-based) indices.
Type : Utility method
Title : keep_chars
Usage : my $clone = $object->keep_chars([6,3,4,1]);
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : Required, an array ref of integers
Comments: The columns are retained in the order in
which they were supplied.
- prune_chars()
- Creates a cloned matrix that omits the characters at the supplied
(zero-based) indices.
Type : Utility method
Title : prune_chars
Usage : my $clone = $object->prune_chars([6,3,4,1]);
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : Required, an array ref of integers
Comments: The columns are retained in the order in
which they were supplied.
- prune_invariant()
- Creates a cloned matrix that omits the characters for which all taxa have
the same state (or missing);
Type : Utility method
Title : prune_invariant
Usage : my $clone = $object->prune_invariant;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : None
Comments: The columns are retained in the order in
which they were supplied.
- prune_uninformative()
- Creates a cloned matrix that omits all uninformative characters.
Uninformative are considered characters where all non-missing values are
either invariant or autapomorphies.
Type : Utility method
Title : prune_uninformative
Usage : my $clone = $object->prune_uninformative;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : None
Comments: The columns are retained in the order in
which they were supplied.
- prune_missing_and_gaps()
- Creates a cloned matrix that omits all characters for which the invocant
only has missing and/or gap states.
Type : Utility method
Title : prune_missing_and_gaps
Usage : my $clone = $object->prune_missing_and_gaps;
Function: Creates spliced clone.
Returns : A spliced clone of the invocant.
Args : None
Comments: The columns are retained in the order in
which they were supplied.
- bootstrap()
- Creates bootstrapped clone.
Type : Utility method
Title : bootstrap
Usage : my $bootstrap = $object->bootstrap;
Function: Creates bootstrapped clone.
Returns : A bootstrapped clone of the invocant.
Args : Optional, a subroutine reference that returns a random
integer between 0 (inclusive) and the argument provided
to it (exclusive). The default implementation is to use
sub { int( rand( shift ) ) }, a user might override this
by providing an implementation with a better random number
generator.
Comments: The bootstrapping algorithm uses perl's random number
generator to create a new series of indices (without
replacement) of the same length as the original matrix.
These indices are first sorted, then applied to the
cloned sequences. Annotations (if present) stay connected
to the resampled cells.
- jackknife()
- Creates jackknifed clone.
Type : Utility method
Title : jackknife
Usage : my $bootstrap = $object->jackknife(0.5);
Function: Creates jackknifed clone.
Returns : A jackknifed clone of the invocant.
Args : * Required, a number between 0 and 1, representing the
fraction of characters to jackknife.
* Optional, a subroutine reference that returns a random
integer between 0 (inclusive) and the argument provided
to it (exclusive). The default implementation is to use
sub { int( rand( shift ) ) }, a user might override this
by providing an implementation with a better random number
generator.
Comments: The jackknife algorithm uses perl's random number
generator to create a new series of indices of cells to keep.
These indices are first sorted, then applied to the
cloned sequences. Annotations (if present) stay connected
to the resampled cells.
- replicate()
- Creates simulated replicate.
Type : Utility method
Title : replicate
Usage : my $replicate = $matrix->replicate($tree);
Function: Creates simulated replicate.
Returns : A simulated replicate of the invocant.
Args : Tree to simulate the characters on.
Optional:
-seed => a random integer seed
-model => an object of class Bio::Phylo::Models::Substitution::Dna or
Bio::Phylo::Models::Substitution::Binary
-random_rootseq => start DNA sequence simulation from random ancestral sequence
instead of the median sequence in the alignment.
Comments: Requires Statistics::R, with 'ape', 'phylosim', 'phangorn' and 'phytools'.
If model is not given as argument, it will be estimated.
- insert()
- Insert argument in invocant.
Type : Listable method
Title : insert
Usage : $matrix->insert($datum);
Function: Inserts $datum in $matrix.
Returns : Modified object
Args : A datum object
Comments: This method re-implements the method by the same
name in Bio::Phylo::Listable
- compress_lookup()
- Removes unused states from lookup table
Type : Method
Title : validate
Usage : $obj->compress_lookup
Function: Removes unused states from lookup table
Returns : $self
Args : None
- check_taxa()
- Validates taxa associations.
Type : Method
Title : check_taxa
Usage : $obj->check_taxa
Function: Validates relation between matrix and taxa block
Returns : Modified object
Args : None
Comments: This method implements the interface method by the same
name in Bio::Phylo::Taxa::TaxaLinker
- make_taxa()
- Creates a taxa block from the objects contents if none exists yet.
Type : Method
Title : make_taxa
Usage : my $taxa = $obj->make_taxa
Function: Creates a taxa block from the objects contents if none exists yet.
Returns : $taxa
Args : NONE
- to_xml()
- Serializes matrix to nexml format.
Type : Format convertor
Title : to_xml
Usage : my $data_block = $matrix->to_xml;
Function: Converts matrix object into a nexml element structure.
Returns : Nexml block (SCALAR).
Args : Optional:
-compact => 1 (for compact representation of matrix)
- to_nexus()
- Serializes matrix to nexus format.
Type : Format convertor
Title : to_nexus
Usage : my $data_block = $matrix->to_nexus;
Function: Converts matrix object into a nexus data block.
Returns : Nexus data block (SCALAR).
Args : The following options are available:
# if set, writes TITLE & LINK tokens
'-links' => 1
# if set, writes block as a "data" block (deprecated, but used by mrbayes),
# otherwise writes "characters" block (default)
-data_block => 1
# if set, writes "RESPECTCASE" token
-respectcase => 1
# if set, writes "GAPMODE=(NEWSTATE or MISSING)" token
-gapmode => 1
# if set, writes "MSTAXA=(POLYMORPH or UNCERTAIN)" token
-polymorphism => 1
# if set, writes character labels
-charlabels => 1
# if set, writes state labels
-statelabels => 1
# if set, writes mesquite-style charstatelabels
-charstatelabels => 1
# by default, names for sequences are derived from $datum->get_name, if
# 'internal' is specified, uses $datum->get_internal_name, if 'taxon'
# uses $datum->get_taxon->get_name, if 'taxon_internal' uses
# $datum->get_taxon->get_internal_name, if $key, uses $datum->get_generic($key)
-seqnames => one of (internal|taxon|taxon_internal|$key)
- to_dom()
- Analog to to_xml.
Type : Serializer
Title : to_dom
Usage : $matrix->to_dom
Function: Generates a DOM subtree from the invocant
and its contained objects
Returns : an Element object
Args : Optional:
-compact => 1 : renders characters as sequences,
not individual cells
There is a mailing list at
<https://groups.google.com/forum/#!forum/bio-phylo> for any user or
developer questions and discussions.
- Bio::Phylo::Taxa::TaxaLinker
- This object inherits from Bio::Phylo::Taxa::TaxaLinker, so the methods
defined therein are also applicable to Bio::Phylo::Matrices::Matrix
objects.
- Bio::Phylo::Matrices::TypeSafeData
- This object inherits from Bio::Phylo::Matrices::TypeSafeData, so the
methods defined therein are also applicable to
Bio::Phylo::Matrices::Matrix objects.
- Bio::Phylo::Manual
- Also see the manual: Bio::Phylo::Manual and
<http://rutgervos.blogspot.com>.
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann,
Mark A Jensen and Chase Miller, 2011. Bio::Phylo -
phyloinformatic analysis using Perl. BMC Bioinformatics 12:63.
<http://dx.doi.org/10.1186/1471-2105-12-63>
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |