AI::Categorizer::Collection - Access stored documents
my $c = new AI::Categorizer::Collection::Files
(path => '/tmp/docs/training',
category_file => '/tmp/docs/cats.txt');
print "Total number of docs: ", $c->count_documents, "\n";
while (my $document = $c->next) {
...
}
$c->rewind; # For further operations
This abstract class implements an iterator for accessing documents in their
natively stored format. You cannot directly create an instance of the
Collection class, because it is abstract - see the documentation for the
"Files",
"SingleFile", or
"InMemory" subclasses for a concrete
interface.
- new()
- Creates a new Collection object and returns it. Accepts the following
parameters:
- category_hash
- Indicates a reference to a hash which maps document names to category
names. The keys of the hash are the document names, each value should be a
reference to an array containing the names of the categories to which each
document belongs.
- category_file
- Indicates a file which should be read in order to create the
"category_hash". Each line of the file
should list a document's name, followed by a list of category names, all
separated by whitespace.
- stopword_file
- Specifies a file containing a list of "stopwords", which are
words that should automatically be disregarded when scanning/reading
documents. The file should contain one word per line. The file will be
parsed and then fed as the "stopwords"
parameter to the Document "new()"
method.
- verbose
- If true, some status/debugging information will be printed to
"STDOUT" during operation.
- document_class
- The class indicating what type of Document object should be created. This
generally specifies the format that the documents are stored in. The
default is
"AI::Categorizer::Document::Text".
- next()
- Returns the next Document object in the Collection.
- rewind()
- Resets the iterator for further calls to
"next()".
- count_documents()
- Returns the total number of documents in the Collection. Note that this
usually resets the iterator. This is because it may not be possible to
resume iterating where we left off.
Ken Williams, ken@mathforum.org
Copyright 2002-2003 Ken Williams. All rights reserved.
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
AI::Categorizer(3), Storable(3)