AI::Categorizer::FeatureSelector::ChiSquare - ChiSquare Feature Selection class
# the recommended way to use this class is to let the KnowledgeSet
# instanciate it
use AI::Categorizer::KnowledgeSetSMART;
my $ksetCHI = new AI::Categorizer::KnowledgeSetSMART(
tfidf_notation =>'Categorizer',
feature_selection=>'chi_square', ...other parameters...);
# however it is also possible to pass an instance to the KnowledgeSet
use AI::Categorizer::KnowledgeSet;
use AI::Categorizer::FeatureSelector::ChiSquare;
my $ksetCHI = new AI::Categorizer::KnowledgeSet(
feature_selector => new ChiSquare(features_kept=>2000,verbose=>1),
...other parameters...
);
Feature selection with the ChiSquare function.
Chi-Square(t,ci) = (N.(AD-CB)^2)
-----------------------
(A+C).(B+D).(A+B).(C+D)
where t = term
ci = category i
N = number of documents in the collection
A = number of times where t and c co-occur
B = " " " t occurs without c
C = " " " c occurs without t
D = " " " neither c nor t occur
for more details, see : Yiming Yang, Jan O. Pedersen, A
Comparative Study on Feature Selection in Text Categorization, in
Proceedings of ICML-97, 14th International Conference on Machine Learning,
1997. (available on citeseer.nj.nec.com)
Francois Paradis, paradifr@iro.umontreal.ca with inspiration from Ken Williams
AI::Categorizer code