|
NAMELingua::Stem::En - Porter's stemming algorithm for 'generic' EnglishSYNOPSISuse Lingua::Stem::En; my $stems = Lingua::Stem::En::stem({ -words => $word_list_reference, -locale => 'en', -exceptions => $exceptions_hash, }); DESCRIPTIONThis routine applies the Porter Stemming Algorithm to its parameters, returning the stemmed words.It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains these notes: Purpose: Implementation of the Porter stemming algorithm documented in: Porter, M.F., "An Algorithm For Suffix Stripping," Program 14 (3), July 1980, pp. 130-137. Provenance: Written by B. Frakes and C. Cox, 1986. I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may misbehave on short words starting with "y", but I can't think of any examples. The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix. CHANGES1999.06.15 - Changed to '.pm' module, moved into Lingua::Stem namespace, optionalized the export of the 'stem' routine into the caller's namespace, added named parameters 1999.06.24 - Switch core implementation of the Porter stemmer to the one written by Jim Richardson <jimr@maths.usyd.edu.au> 2000.08.25 - 2.11 Added stemming cache 2000.09.14 - 2.12 Fixed *major* :( implementation error of Porter's algorithm Error was entirely my fault - I completely forgot to include rule sets 2,3, and 4 starting with Lingua::Stem 0.30. -- Jerilyn Franz 2003.09.28 - 2.13 Corrected documentation error pointed out by Simon Cozens. 2005.11.20 - 2.14 Changed rule declarations to conform to Perl style convention for 'private' subroutines. Changed Exporter invokation to more portable 'require' vice 'use'. 2006.02.14 - 2.15 Added ability to pass word list by 'handle' for in-place stemming. 2009.07.27 - 2.16 Documentation Fix 2020.06.20 - 2.30 Version renumber for module consistency. 2020.09.26 - 2.31 Fix for Latin1/UTF8 issue in documentation METHODS
NOTESThis code is almost entirely derived from the Porter 2.1 module written by Jim Richardson.SEE ALSOLingua::Stem AUTHORJim Richardson, University of Sydney jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html Integration in Lingua::Stem by Jerilyn Franz, FreeRun Technologies, <cpan@jerilyn.info> COPYRIGHTJim Richardson, University of Sydney Jerilyn Franz, FreeRun TechnologiesThis code is freely available under the same terms as Perl. BUGSTODO
Visit the GSP FreeBSD Man Page Interface. |