|
NAMELingua::Stem::EnBroken - Porter's stemming algorithm for 'generic' EnglishSYNOPSISuse Lingua::Stem::EnBroken; my $stems = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference, -locale => 'en', -exceptions => $exceptions_hash, }); DESCRIPTIONThis routine MIS-applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is an intentionally broken version of Lingua::Stem::En for people needing backwards compatibility with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you aren't one of those people.It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains these notes: Purpose: Implementation of the Porter stemming algorithm documented in: Porter, M.F., "An Algorithm For Suffix Stripping," Program 14 (3), July 1980, pp. 130-137. Provenance: Written by B. Frakes and C. Cox, 1986. I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may misbehave on short words starting with "y", but I can't think of any examples. The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix. CHANGES2003.09.28 - Documentation fix 2000.09.14 - Forked from the Lingua::Stem::En.pm module to provide a backward compatibly broken version for people needing consistent behavior with 0.30 and 0.40 more than accurate stemming. METHODS
NOTESThis code is almost entirely derived from the Porter 2.1 module written by Jim Richardson.SEE ALSOLingua::Stem AUTHORJim Richardson, University of Sydney jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html Integration in Lingua::Stem by Jerilyn Franz, FreeRun Technologies, <cpan@jerilyn.info> COPYRIGHTJim Richardson, University of Sydney Jerilyn Franz, FreeRun TechnologiesThis code is freely available under the same terms as Perl. BUGSTODO
Visit the GSP FreeBSD Man Page Interface. |