|
|
| |
Statistics::Contingency(3) |
User Contributed Perl Documentation |
Statistics::Contingency(3) |
Statistics::Contingency - Calculate precision, recall, F1, accuracy, etc.
use Statistics::Contingency;
my $s = new Statistics::Contingency(categories => \@all_categories);
while (...something...) {
...
$s->add_result($assigned_categories, $correct_categories);
}
print "Micro F1: ", $s->micro_F1, "\n"; # Access a single statistic
print $s->stats_table; # Show several stats in table form
The "Statistics::Contingency" class helps you
calculate several useful statistical measures based on 2x2 "contingency
tables". I use these measures to help judge the results of automatic text
categorization experiments, but they are useful in other situations as well.
The general usage flow is to tally a whole bunch of results in the
"Statistics::Contingency" object, then
query that object to obtain the measures you are interested in. When all
results have been collected, you can get a report on accuracy, precision,
recall, F1, and so on, with both macro-averaging and micro-averaging over
categories.
All of the statistics offered by this module can be calculated for each category
and then averaged, or can be calculated over all decisions and then averaged.
The former is called macro-averaging (specifically, macro-averaging with
respect to category), and the latter is called micro-averaging. The two
procedures bias the results differently - micro-averaging tends to
over-emphasize the performance on the largest categories, while
macro-averaging over-emphasizes the performance on the smallest. It's often
best to look at both of them to get a good idea of how your data distributes
across categories.
All of the statistics are calculated based on a so-called "contingency
table", which looks like this:
Correct=Y Correct=N
+-----------+-----------+
Assigned=Y | a | b |
+-----------+-----------+
Assigned=N | c | d |
+-----------+-----------+
a, b, c, and d are counts that reflect how the assigned categories
matched the correct categories. Depending on whether a macro-statistic or a
micro-statistic is being calculated, these numbers will be tallied
per-category or for the entire result set.
The following statistics are available:
- accuracy
This measures the portion of all decisions that were correct
decisions. It is defined as
"(a+d)/(a+b+c+d)". It falls in the
range from 0 to 1, with 1 being the best score.
Note that macro-accuracy and micro-accuracy will always give
the same number.
- error
This measures the portion of all decisions that were incorrect
decisions. It is defined as
"(b+c)/(a+b+c+d)". It falls in the
range from 0 to 1, with 0 being the best score.
Note that macro-error and micro-error will always give the
same number.
- precision
This measures the portion of the assigned categories that were
correct. It is defined as "a/(a+b)".
It falls in the range from 0 to 1, with 1 being the best score.
- recall
This measures the portion of the correct categories that were
assigned. It is defined as "a/(a+c)".
It falls in the range from 0 to 1, with 1 being the best score.
- F1
This measures an even combination of precision and recall. It
is defined as "2*p*r/(p+r)". In terms
of a, b, and c, it may be expressed as
"2a/(2a+b+c)". It falls in the range
from 0 to 1, with 1 being the best score.
The F1 measure is often the only simple measure that is worth
trying to maximize on its own - consider the fact that you can get a perfect
precision score by always assigning zero categories, or a perfect recall
score by always assigning every category. A truly smart system will assign
the correct categories and only the correct categories, maximizing precision
and recall at the same time, and therefore maximizing the F1 score.
Sometimes it's worth trying to maximize the accuracy score, but
accuracy (and its counterpart error) are considered fairly crude scores that
don't give much information about the performance of a categorizer.
The general execution flow when using this class is to create a
"Statistics::Contingency" object, add a
bunch of results to it, and then report on the results.
- $e = Statistics::Contingency->new()
Returns a new
"Statistics::Contingency" object.
Expects a "categories" parameter
specifying the entire set of categories that may be assigned during this
experiment. Also accepts a "verbose"
parameter - if true, some diagnostic status information will be
displayed when certain actions are performed.
- $e->add_result($assigned_categories,
$correct_categories,
$name)
Adds a new result to the experiment. The lists of assigned and
correct categories can be given as an array of category names (strings),
as a hash whose keys are the category names and whose values are
anything logically true, or as a single string if there is only one
category.
If you've already got the lists in hash form, this will be the
fastest way to pass them. Otherwise, the current implementation will
convert them to hash form internally in order to make its calculations
efficient.
The $name parameter is an optional
name for this result. It will only be used in error messages or
debugging/progress output.
In the current implementation, we only store the contingency
tables per category, as well as a table for the entire result set. This
means that you can't recover information about any particular single
result from the
"Statistics::Contingency" object.
- $e->set_entries($a, $b,
$c, $d)
If you don't wish to use the c<add_result()>
interface, but still take advantage of the calculation methods and the
various edge cases they handle, you can directly set the four elements
of the contingency table with this method.
- $e->micro_accuracy
Returns the micro-averaged accuracy for the data set.
- $e->micro_error
Returns the micro-averaged error for the data set.
- $e->micro_precision
Returns the micro-averaged precision for the data set.
- $e->micro_recall
Returns the micro-averaged recall for the data set.
- $e->micro_F1
Returns the micro-averaged F1 for the data set.
- $e->macro_accuracy
Returns the macro-averaged accuracy for the data set.
- $e->macro_error
Returns the macro-averaged error for the data set.
- $e->macro_precision
Returns the macro-averaged precision for the data set.
- $e->macro_recall
Returns the macro-averaged recall for the data set.
- $e->macro_F1
Returns the macro-averaged F1 for the data set.
- $e->stats_table
Returns a string combining several statistics in one graphic
table. Since accuracy is 1 minus error, we only report error since it
takes less space to print. An optional argument specifies the number of
significant digits to show in the data - the default is 3 significant
digits.
- $e->category_stats
Returns a hash reference whose keys are the names of each
category, and whose values contain the various statistical measures
(accuracy, error, precision, recall, or F1) about each category as a
hash reference. For example, to print a single statistic:
print $e->category_stats->{sports}{recall}, "\n";
Or to print certain statistics for all categtories:
my $stats = $e->category_stats;
while (my ($cat, $value) = each %$stats) {
print "Category '$cat': \n";
print " Accuracy: $value->{accuracy}\n";
print " Precision: $value->{precision}\n";
print " F1: $value->{F1}\n";
}
Ken Williams <kwilliams@cpan.org>
Copyright 2002-2008 Ken Williams. All rights reserved.
This distribution is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |