|
NAMEText::Language::Guess - Trained module to guess a document's languageSYNOPSISuse Text::Language::Guess; my $guesser = Text::Language::Guess->new(); my $lang = $guesser->language_guess("bill.txt"); # prints 'en' print "Best fit: $lang\n"; DESCRIPTIONText::Language::Guess guesses a document's language. Its implementation is simple: Using "Text::ExtractWords" and "Lingua::StopWords" from CPAN, it determines how many of the known stopwords the document contains for each language supported by "Lingua::StopWords".Each word in the document recognized as stopword of a particular language scores one point for this language. The "language_guess()" function takes a document as a parameter and returns the abbreviation of the language that it is most likely written in. Supported Languages:
Methods
EXAMPLESuse Text::Language::Guess; # Guess language in a string instead of a file my $guesser = Text::Language::Guess->new(); my $lang = $guesser->language_guess_string("Make love not war"); # 'en' # Limit number of languages to choose from my $guesser = Text::Language::Guess->new(languages => ['da', 'nl']); my $lang = $guesser->language_guess_string( "Which is closer to English, danish or dutch?"); # 'nl' # Show different scores my $guesser = Text::Language::Guess->new(); my $scores = $guesser->scores_string( "This text is English, but other languages are scoring as well"); use Data::Dumper; print Dumper($scores); # $VAR1 = { # 'pt' => 1, # 'en' => 6, # 'fr' => 1, # 'nl' => 1 # }; LEGALESECopyright 2005 by Mike Schilli, all rights reserved. This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.AUTHOR2005, Mike Schilli <cpan@perlmeister.com>
Visit the GSP FreeBSD Man Page Interface. |