|
NAMEWordNet::Similarity - Perl modules for computing measures of semantic relatedness.SYNOPSISBasic Usage Exampleuse WordNet::QueryData; use WordNet::Similarity::path; my $wn = WordNet::QueryData->new; my $measure = WordNet::Similarity::path->new ($wn); my $value = $measure->getRelatedness("car#n#1", "bus#n#2"); my ($error, $errorString) = $measure->getError(); die $errorString if $error; print "car (sense 1) <-> bus (sense 2) = $value\n"; Using a configuration file to initialize the measureuse WordNet::Similarity::path; my $sim = WordNet::Similarity::path->new($wn, "mypath.cfg"); my $value = $sim->getRelatedness("dog#n#1", "cat#n#1"); ($error, $errorString) = $sim->getError(); die $errorString if $error; print "dog (sense 1) <-> cat (sense 1) = $value\n"; Printing tracesprint "Trace String -> ".($sim->getTraceString())."\n"; DESCRIPTIONIntroductionWe observe that humans find it extremely easy to say if two words are related and if one word is more related to a given word than another. For example, if we come across two words, 'car' and 'bicycle', we know they are related as both are means of transport. Also, we easily observe that 'bicycle' is more related to 'car' than 'fork' is. But is there some way to assign a quantitative value to this relatedness? Some ideas have been put forth by researchers to quantify the concept of relatedness of words, with encouraging results.Eight of these different measures of relatedness have been implemented in this software package. A simple edge counting measure and a random measure have also been provided. These measures rely heavily on the vast store of knowledge available in the online electronic dictionary -- WordNet. So, we use a Perl interface for WordNet called WordNet::QueryData to make it easier for us to access WordNet. The modules in this package REQUIRE that the WordNet::QueryData module be installed on the system before these modules are installed. FunctionThe following function is defined:
MethodsThe following methods are defined in this package:Public methods
If any of these tests fails, then the error level is set to non-zero, a message is appended to the error string, and undef is returned. If the synset is well-formed and exists, then a list is returned that has the format ($word, $pos, $sense, $offset).
DiscussionThis package consists of Perl modules along with supporting Perl programs that implement the semantic relatedness measures described by Leacock Chodorow (1998), Jiang Conrath (1997), Resnik (1995), Lin (1998), Wu Palmer (1993), Hirst St-Onge (1998) the Extended Gloss Overlaps measure by Banerjee and Pedersen (2002) and a Gloss Vector measure recently introduced by Patwardhan and Pedersen. The package contains Perl modules designed as object classes with methods that take as input two word senses. The semantic distance between these word senses is returned by these methods. A quantitative measure of the degree to which two word senses are related has wide ranging applications in numerous areas, such as word sense disambiguation, information retrieval, etc. For example, in order to determine which sense of a given word is being used in a particular context, the sense having the highest relatedness with its context word senses is most likely to be the sense being used. Similarly, in information retrieval, retrieving documents containing highly related concepts are more likely to have higher precision and recall values.A command line interface to these modules is also present in the package. The simple, user-friendly interface simply returns the relatedness measure of two given words. Number of switches and options have been provided to modify the output and enhance it with trace information and other useful output. Support programs for generating information content files from various corpora are also available in the package. The information content files are required by three of the measures for computing the relatedness of concepts. There is also a tool to find the depths of the taxonomies in WordNet. Configuration files The behavior of the measures of semantic relatedness can be controlled by using configuration files. These configuration files specify how certain parameters are initialized within the object. A configuration file may be specified as a parameter during the creation of an object using the new method. The configuration files must follow a fixed format. Every configuration file starts with the name of the module ON THE FIRST LINE of the file. For example, a configuration file for the res module will have on the first line 'WordNet::Similarity::res'. This is followed by the various parameters, each on a new line and having the form 'name::value'. The 'value' of a parameter is optional (in case of boolean parameters). In case 'value' is omitted, we would have just 'name::' on that line. Comments are supported in the configuration file. Anything following a '#' is ignored in the configuration file. Sample configuration files are present in the '/samples' subdirectory of the package. Each of the modules has specific parameters that can be set/reset using the configuration files. Please read the manpages or the perldocs of the respective modules for details on the parameters specific to each of the modules. For instance, 'man WordNet::Similarity::res' or 'perldoc WordNet::Similarity::res' should display the documentation for the Resnik module. The module parses the configuration file and recognizes the following parameters:
UsageThe semantic relatedness modules in this distribution are built as classes. The classes define four methods that are useful in finding relatedness values for pairs of synsets.new() getRelatedness() getError() getTraceString() Typical Usage Examples To create an object of the Resnik measure, we would have the following lines of code in the Perl program. use WordNet::Similarity::res; $object = WordNet::Similarity::res->new($wn, '~/resnik.conf'); The reference of the initialized object is stored in the scalar variable '$object'. '$wn' contains a WordNet::QueryData object that should have been created earlier in the program. The second parameter to the 'new' method is the path of the configuration file for the resnik measure. If the 'new' method is unable to create the object, '$object' would be undefined. This, as well as any other error/warning may be tested. die "Unable to create resnik object.\n" unless defined $object; ($err, $errString) = $object->getError(); die $errString."\n" if($err); To create a Leacock-Chodorow measure object, using default values, i.e. no configuration file, we would have the following: use WordNet::Similarity::lch; $measure = WordNet::Similarity::lch->new($wn); To find the semantic relatedness of the first sense of the noun 'car' and the second sense of the noun 'bus' using the resnik measure, we would write the following piece of code: $relatedness = $object->getRelatedness('car#n#1', 'bus#n#2'); To get traces for the above computation: print $object->getTraceString(); However, traces must be enabled using configuration files. By default traces are turned off. AUTHORSTed Pedersen, University of Minnesota Duluth tpederse at d.umn.edu Siddharth Patwardhan, University of Utah, Salt Lake City sidd at cs.utah.edu Jason Michelizzi, Univeristy of Minnesota Duluth mich0212 at d.umn.edu Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh banerjee+ at cs.cmu.edu BUGSNone.To submit a bug report, go to http://groups.yahoo.com/group/wn-similarity or send e-mail to tpederse at d.umn.edu. SEE ALSOperl(1), WordNet::Similarity::jcn(3), WordNet::Similarity::res(3), WordNet::Similarity::lin(3), WordNet::Similarity::lch(3), WordNet::Similarity::hso(3), WordNet::Similarity::lesk(3), WordNet::Similarity::wup(3), WordNet::Similarity::path(3), WordNet::Similarity::random(3), WordNet::Similarity::ICFinder(3), WordNet::Similarity::PathFinder(3) WordNet::QueryData(3)http://www.cs.utah.edu/~sidd http://wordnet.princeton.edu http://www.ai.mit.edu/~jrennie/WordNet http://groups.yahoo.com/group/wn-similarity COPYRIGHTCopyright (c) 2005, Ted Pedersen, Siddharth Patwardhan, Jason Michelizzi and Satanjeev BanerjeeThis program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Note: a copy of the GNU General Public License is available on the web at <http://www.gnu.org/licenses/gpl.txt> and is included in this distribution as GPL.txt.
Visit the GSP FreeBSD Man Page Interface. |