AI::Categorizer::Learner::Weka - Pass-through wrapper to Weka system
use AI::Categorizer::Learner::Weka;
# Here $k is an AI::Categorizer::KnowledgeSet object
my $nb = new AI::Categorizer::Learner::Weka(...parameters...);
$nb->train(knowledge_set => $k);
... time passes ...
$nb = AI::Categorizer::Learner->restore_state('filename');
my $c = new AI::Categorizer::Collection::Files( path => ... );
while (my $document = $c->next) {
my $hypothesis = $nb->categorize($document);
print "Best assigned category: ", $hypothesis->best_category, "\n";
This class doesn't implement any machine learners of its own, it merely passes
the data through to the Weka machine learning system
( This can give you access to a
collection of machine learning algorithms not otherwise implemented in
Currently this is a simple command-line wrapper that calls
"java" subprocesses. In the future this
may be converted to an "Inline::Java"
wrapper for better performance (faster running times). However, if you're
looking for really great performance, you're probably looking in the wrong
place - this Weka wrapper is intended more as a way to try lots of different
machine learning methods.
This class inherits from the
"AI::Categorizer::Learner" class, so all of
its methods are available unless explicitly mentioned here.
Creates a new Weka Learner and returns it. In addition to the parameters
accepted by the "AI::Categorizer::Learner"
class, the Weka subclass accepts the following parameters:
- java_path
- Specifies where the "java" executable
can be found on this system. The default is simply
"java", meaning that it will search your
"PATH" to find java.
- java_args
- Specifies a list of any additional arguments to give to the java process.
Commonly it's necessary to allocate more memory than the default, using an
argument like "-Xmx130MB".
- weka_path
- Specifies the path to the "weka.jar"
file containing the Weka bytecode. If Weka has been installed somewhere in
your java "CLASSPATH", you needn't
specify a "weka_path".
- weka_classifier
- Specifies the Weka class to use for a categorizer. The default is
"weka.classifiers.NaiveBayes". Consult
your Weka documentation for a list of other classifiers available.
- weka_args
- Specifies a list of any additional arguments to pass to the Weka
classifier class when building the categorizer.
- tmpdir
- A directory in which temporary files will be written when training the
categorizer and categorizing new documents. The default is given by
Trains the categorizer. This prepares it for later use in categorizing
documents. The "knowledge_set" parameter
must provide an object of the class
"AI::Categorizer::KnowledgeSet" (or a
subclass thereof), populated with lots of documents and categories. See
AI::Categorizer::KnowledgeSet for the details of how to create such an object.
Returns an "AI::Categorizer::Hypothesis"
object representing the categorizer's "best guess" about which
categories the given document should be assigned to. See
AI::Categorizer::Hypothesis for more details on how to use this object.
Saves the categorizer for later use. This method is inherited from
Ken Williams,
Copyright 2000-2003 Ken Williams. All rights reserved.
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.