|
|
| |
WordNet::Similarity::PathFinder(3) |
User Contributed Perl Documentation |
WordNet::Similarity::PathFinder(3) |
WordNet::Similarity::PathFinder - module to implement path finding methods (by
node counting) for WordNet::Similarity measures of semantic relatedness
use WordNet::QueryData;
my $wn = WordNet::QueryData->new;
use WordNet::Similarity::PathFinder;
my $obj = WordNet::Similarity::PathFinder->new ($wn);
my $wps1 = 'winston_churchill#n#1';
my $wps2 = 'england#n#1';
# parseWps returns reference to an array that contains
# word1 pos1 sense1 offset1 word2 pos2 sense2 offset2
my $result = $obj->parseWps($wps1, $wps2);
print "@$result\n";
# path is a reference to an array that contains the path between
# wps1 and wps2 expressed as a series of wps values
my @paths = $obj->getShortestPath($wps1, $wps2, 'n', 'wps');
my ($length, $path) = @{shift @paths};
defined $path or die "No path between synsets";
print "shortest path between $wps1 and $wps2 is $length edges long\n";
print "@$path\n";
my $offset1 = $wn -> offset($wps1);
my $offset2 = $wn -> offset($wps2);
# path is a reference to an array that contains the path between
# offset1 and offset2 expressed as a series of offset values
my @paths = $obj->getShortestPath($offset1, $offset2, 'n', 'offset');
my ($length, $path) = @{shift @paths};
defined $path or die "No path between synsets";
print "shortest path between $offset1 and $offset2 is $length edges long\n";
print "@$path\n";
This class is derived from (i.e., is a sub-class of) WordNet::Similarity.
The methods in this module are useful for finding paths between
concepts in WordNet's 'is-a' taxonomies. Concept A is-a concept B if, and
only if, B is a hypernym of A or A is in the hypernym tree of B. N.B., only
nouns and verbs have hypernyms.
The methods that find path lengths (such as
"getShortestPath()" and
"getAllPaths()" compute the lengths using
node-counting not edge-counting. In general, the length of a path using
node-counting will always be one more than the length using edge-counting.
For example, if concept A is a hyponym of concept B, then the path length
between A and B using node-counting is 2, but the length using edge-counting
is 1. Likewise, the path between A and A is 1 using node-counting and 0
using edge-counting.
This module inherits all the methods of WordNet::Similarity. Additionally, the
following methods are also defined.
Public methods
- $measure->setPosList()
- Specifies the parts of speech that measures derived from this module
support (namely, nouns and verbs).
parameters: none
returns: true
- $self->traceOptions()
- Overrides method of same name in WordNet::Similarity. Prints
module-specific configuration options to the trace string (if tracing is
on). PathFinder supports one module specific option: rootNode.
Parameters: none
returns: nothing
- $measure->parseWps($synset1, $synset2)
- parameters: synset1, synset2 -- two synsets in wps format
returns: a reference to an array,
WordNet::Similarity::UNRELATED, or undef
Overrides the parseWps() method in WordNet::Similarity
in order to run additional checks, but calls
WordNet::Similarity::parseWps() to get those checks accomplished
as well. Thus, this method does everything that
WordNet::Similarity::parseWps does.
- quote from WordNet::Similarity::parseWps:
- This method checks the format of the two input synsets by calling
validateSynset() for each synset.
If the synsets are in wps format, a reference to an array will
be returned. This array has the form [$word1,
$pos1, $sense1,
$offset1, $word2,
$pos2, $sense2,
$offset2] where $word1
is the word part of $wps1,
$pos1, is the part of speech of
$wps1, $sense1 is the
sense from $wps.
$offset1 is the offset for
$wps1.
If an error occurs (such as a synset being poorly-formed),
then undef is returned, the error level is set to non-zero, and an error
message is appended to the error string.
In addition, if the two synsets are from different parts of
speech, then WordNet::Similarity::UNRELATED is returned, the error level is
set to 1, and a message is appended to the error string.
If either synset is not a noun or a verb, then the error level is
set to 1, a message is appended to the error string, and undef is
returned.
If the synsets are in wps format, a reference to an array will be
returned. This array has the form [$word1, $pos1,
$sense1, $offset1,
$word2, $pos2,
$sense2, $offset2].
- $measure->getShortestPath($synset1, $synset2, $pos, $mode)
- Given two input synsets, returns the shortest path between the two
synsets.
Parameters: two synsets, a part-of-speech, and a mode
indicator (i.e., the string 'offset' or 'wps'). If the mode is 'offset',
then the synsets should be WordNet offsets. If the mode is 'wps', then
the synsets should be in word#pos#sense format.
Returns: a list of references to arrays. Each array has the
form "($path_length, $path_ref)" where
$path_ref is a reference to an array whose
elements are the synsets along the shortest path between the two input
synsets. There will be as many array references returned as there are
shortest paths between the synsets. That is, there will be no arrays
returned if there is no path between the synsets, and there will be at
least one array returned if there is a path between the synsets. If
there are multiple paths tied for being shortest in length, then all
those paths are returned (hence, this is why multiple array references
can be returned).
Upon error, returns undef, sets the error level to non-zero,
and appends a message to the error string.
- $measure->getAllPaths($synset1, $synset2, $pos, $mode)
- Given two input synsets, returns all the paths between the two synsets.
Parameters: a reference to the object, two synsets, a
part-of-speech, and a mode indicator (the string 'offset' or 'wps').
If the mode is 'offset', then the synsets should be WordNet
offsets. If the mode is 'wps', then they should be strings in
word#pos#sense format.
Returns: A list of all paths, sorted by path length in
ascending order. The format for each item in the list is a reference to
an array that has the format: [$top, $length,
[@synsets_list]] where @synset_list is a list of
synsets along the path (including the two input synsets)
Returns undef on error.
- $measure->validateSynset($synset)
- parameters: synset -- a string in word#pos#sense format
returns: a list or undef on error
This method overrides the method of the same name in
WordNet::Similarity to provide additional behavior but calls
WordNet::Similarity::validateSynset to accomplish that method's
behavior. Thus, this method does everything that
WordNet::Similarity::validateSynset does.
- quote from WordNet::Similarity::validateSynset:
- This method does the following:
- 1.
- Verifies that the synset is well-formed (i.e., that it consists of three
parts separated by #s, the pos is one of {n, v, a, r} and that sense is a
natural number). A synset that matches the pattern '[^\#]+\#[nvar]\#\d+'
is considered well-formed.
- 2.
- Checks if the synset exists by trying to find the offset for the
synset
This method, however, has a slightly different return value.
Instead of merely breaking the synset into three parts, it returns the
"safe" form of the synset. That is, if a synset has multiple word
senses, this method returns the first word sense in that synset (this is so
that other path-finding methods work properly). For example, if the input to
this method is auto#n#1, the return value is ('car', 'n', 1, 2853224) since
the sense 'car#n#1' is the first member of the synset to which 'auto#n#1'
belongs.
If any of these tests fails, then the error level is set to
non-zero, a message is appended to the error string, and undef is
returned.
Private methods
- $measure->_getHypernymTrees($synset, $pos, $mode)
- This method takes as input a synset and returns a list of references to
arrays where these arrays are paths from the input synset to the top of
the taxonomy (*Root*#[nv]#1 if the root node is on).
Parameters: a synset, a part-of-speech, and a mode. The mode
must be either the string 'wps' or 'offset'. If the mode is 'wps', then
the synset must be in wps format; otherwise, it must be an offset.
Returns: a list of references to arrays. These arrays are
paths (hypernym trees).
- getLCSbyPath($synset1, $synset2, $pos, $mode)
- Given two input synsets, finds the least common subsumer (LCS) of them. If
there are multiple candidates for the LCS (due to multiple inheritance),
the LCS that results in the shortest path between in input concepts is
chosen.
Parameters: two synsets, a part of speech, and a mode.
Returns: a list of references to arrays where each array has
the from "($lcs, $pathlength)".
$pathlength is the length of the path between
the two input concepts. There can be multiple LCSs returned if there are
ties for the shortest path between the two synsets. Returns undef on
error.
- $measure->_getSubsumerFromTrees($treeref1, $treeref2, $mode)
- This subroutine returns takes two trees as produced by getHypernymTrees
and returns the most specific subsumer from them.
Parameters: two references to arrays, and a string indicating
mode ('wps' or 'offset').
Returns: the subsumer or undef
- getDepth()
- This method is non-functional and likely to be moved to a different module
soon.
Many of the methods in this module can work with either offsets or wps strings
internally. There are several interesting consequences of each mode.
- 1.
- An offset is not a unique identifier for a synset, but neither is a wps
string. An offset only indicates a byte offset in one of the WordNet data
files (data.noun, data.verb, etc. on Unix-like systems). An offset along
with a part of speech, however, does uniquely identify a synset.
A word#pos#sense string, on the other hand, is the opposite
extreme. A word#pos#sense string is an identifier for a unique word
sense. A synset can have several word senses in it (i.e., a synset is a
set of word senses that are synonymous). The synset {beer_mug#n#1,
stein#n#1} has two word senses. The wps strings 'beer_mug#n#1' and
'stein#n#1' can both be used to refer to the synset. For simplicity, we
usually just use the first wps string when referring to the synset.
N.B., the wps representation was developed by WordNet::QueryData.
- 2.
- Early versions of WordNet::Similarity::* used offsets internally for
finding paths, hypernym trees, subsumers, etc. The module
WordNet::QueryData that is used by Similarity, however, accepts only wps
strings as input to its querySense method, which is used to find
hypernyms. We have found that it is more efficient (faster) to use wps
strings internally.
Ted Pedersen, University of Minnesota Duluth
tpederse at d.umn.edu
Jason Michelizzi, University of Minnesota Duluth
mich0212 at d.umn.edu
Siddharth Patwardhan, University of Utah, Salt Lake City
sidd at cs.utah.edu
WordNet::Similarity(3) WordNet::Similarity::path(3)
WordNet::Similarity::lch(3) WordNet::Similarity::wup(3)
Copyright (c) 2005, Ted Pedersen, Siddharth Patwardhan and Jason Michelizzi
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.
Note: a copy of the GNU General Public License is available on the
web at <http://www.gnu.org/licenses/gpl.txt> and is included in this
distribution as GPL.txt.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |