AI::Categorizer::Hypothesis - Embodies a set of category assignments
use AI::Categorizer::Hypothesis;
# Hypotheses are usually created by the Learner's categorize() method.
# (assume here that $learner and $document have been created elsewhere)
my $h = $learner->categorize($document);
print "Assigned categories: ", join ', ', $h->categories, "\n";
print "Best category: ", $h->best_category, "\n";
print "Assigned scores: ", join ', ', $h->scores( $h->categories ), "\n";
print "Chosen from: ", join ', ', $h->all_categories, "\n";
print +($h->in_category('geometry') ? '' : 'not '), "assigned to geometry\n";
A Hypothesis embodies a set of category assignments that a categorizer makes
about a single document. Because one may be interested in knowing different
kinds of things about the assignments (for instance, what categories were
assigned, which category had the highest score, whether a particular category
was assigned), we provide a simple class to help facilitate these scenarios.
- new(%parameters)
- Returns a new Hypothesis object. Generally a user of
"AI::Categorize" doesn't create a
Hypothesis object directly - they are returned by the Learner's
"categorize()" method. However, if you
wish to create a Hypothesis directly (maybe passing it some fake data for
testing purposes) you may do so using the
"new()" method.
The following parameters are accepted when creating a new
Hypothesis:
- all_categories
- A required parameter which gives the set of all categories that could
possibly be assigned to. The categories should be specified as a reference
to an array of category names (as strings).
- scores
- A hash reference indicating the assignment score for each category. Any
score higher than the "threshold" will
be considered to be assigned.
- threshold
- A number controlling which categories should be assigned - any category
whose score is greater than or equal to
"threshold" will be assigned, any
category whose score is lower than
"threshold" will not be assigned.
- document_name
- An optional string parameter indicating the name of the document about
which this hypothesis was made.
- categories()
- Returns an ordered list of the categories the document was placed in, with
best matches first. Categories are returned by their string names.
- best_category()
- Returns the name of the category with the highest score in this
hypothesis. Bear in mind that this category may not actually be assigned
if no categories' scores exceed the threshold.
- in_category($name)
- Returns true or false depending on whether the document was placed in the
given category.
- scores(@names)
- Returns a list of result scores for the given categories. Since the
interface is still changing, and since different Learners implement
scoring in different ways, not very much can officially be said about the
scores, except that a good score is higher than a bad score. Individual
Learners will have their own procedures for determining scores, so you
cannot compare one Learner's score with another Learner's - for instance,
one Learner might always give scores between 0 and 1, and another Learner
might always return scores less than 0. You often cannot compare scores
from a single Learner on two different categorization tasks either.
- all_categories()
- Returns the list of category names specified with the
"all_categories" constructor
parameter.
- document_name()
- Returns the value of the "document_name"
parameter specified as a constructor parameter, or
"undef" if none was specified.
Ken Williams <ken@mathforum.org>
This distribution is free software; you can redistribute it and/or modify it
under the same terms as Perl itself. These terms apply to every file in the
distribution - if you have questions, please contact the author.