AI::Categorizer::Learner::Weka - Pass-through wrapper to Weka system
use AI::Categorizer::Learner::Weka;
# Here $k is an AI::Categorizer::KnowledgeSet object
my $nb = new AI::Categorizer::Learner::Weka(...parameters...);
$nb->train(knowledge_set => $k);
$nb->save_state('filename');
... time passes ...
$nb = AI::Categorizer::Learner->restore_state('filename');
my $c = new AI::Categorizer::Collection::Files( path => ... );
while (my $document = $c->next) {
my $hypothesis = $nb->categorize($document);
print "Best assigned category: ", $hypothesis->best_category, "\n";
}
This class doesn't implement any machine learners of its own, it merely passes
the data through to the Weka machine learning system
(http://www.cs.waikato.ac.nz/~ml/weka/). This can give you access to a
collection of machine learning algorithms not otherwise implemented in
"AI::Categorizer".
Currently this is a simple command-line wrapper that calls
"java" subprocesses. In the future this
may be converted to an "Inline::Java"
wrapper for better performance (faster running times). However, if you're
looking for really great performance, you're probably looking in the wrong
place - this Weka wrapper is intended more as a way to try lots of different
machine learning methods.
This class inherits from the
"AI::Categorizer::Learner" class, so all of
its methods are available unless explicitly mentioned here.
Creates a new Weka Learner and returns it. In addition to the parameters
accepted by the "AI::Categorizer::Learner"
class, the Weka subclass accepts the following parameters:
- java_path
- Specifies where the "java" executable
can be found on this system. The default is simply
"java", meaning that it will search your
"PATH" to find java.
- java_args
- Specifies a list of any additional arguments to give to the java process.
Commonly it's necessary to allocate more memory than the default, using an
argument like "-Xmx130MB".
- weka_path
- Specifies the path to the "weka.jar"
file containing the Weka bytecode. If Weka has been installed somewhere in
your java "CLASSPATH", you needn't
specify a "weka_path".
- weka_classifier
- Specifies the Weka class to use for a categorizer. The default is
"weka.classifiers.NaiveBayes". Consult
your Weka documentation for a list of other classifiers available.
- weka_args
- Specifies a list of any additional arguments to pass to the Weka
classifier class when building the categorizer.
- tmpdir
- A directory in which temporary files will be written when training the
categorizer and categorizing new documents. The default is given by
"File::Spec->tmpdir".
Trains the categorizer. This prepares it for later use in categorizing
documents. The "knowledge_set" parameter
must provide an object of the class
"AI::Categorizer::KnowledgeSet" (or a
subclass thereof), populated with lots of documents and categories. See
AI::Categorizer::KnowledgeSet for the details of how to create such an object.
Returns an "AI::Categorizer::Hypothesis"
object representing the categorizer's "best guess" about which
categories the given document should be assigned to. See
AI::Categorizer::Hypothesis for more details on how to use this object.
Saves the categorizer for later use. This method is inherited from
"AI::Categorizer::Storable".
Ken Williams, ken@mathforum.org
Copyright 2000-2003 Ken Williams. All rights reserved.
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.