|
NAMELingua::Treebank - Perl extension for manipulating the Penn Treebank formatSYNOPSISuse Lingua::Treebank; my @utterances = Lingua::Treebank->from_penn_file($filename); foreach (@utterances) { # $_ is a Lingua::Treebank::Const now foreach ($_->get_all_terminals) { # $_ is a Lingua::Treebank::Const that is a terminal (word) print $_->word(), ' ' $_->tag(), "\n"; } print "\n\n"; } ABSTRACTModules for abstracting out the "natural" objects in the Penn Treebank format. DESCRIPTIONThis class knows how to read two treebank formats, the Penn format and the Chomsky Normal Form (CNF) format. These formats differ in how they handle terminal nodes. The Penn format places pre-terminal part of speech tags in the left-hand position of a parenthesis-delimited pair, just like it does non-terminal nodes. The CNF format attaches pre-terminal tags to the word with an underscore. For example, the sentence "I spoke" would be rendered in each format as follows:(S (NP (N I)) (VP (V spoke))) Penn (S (NP I_N) (VP spoke_V)) Chomsky Normal Form Almost all the interesting tree-functionality is in the constituent-forming package (included in this distribution, see Lingua::Treebank::Const). PLEASE NOTE: The format expected here is the ".mrg" format, not the ".psd" format. In other words, one POS-tag per word is required. (In response to CPAN bug 15079.) Variables
MethodsClass methods
EXPORTNone by default.HISTORY
SEE ALSOTO DO: Where is Penn Treebank documented?AUTHORJeremy Gillmor Kahn, <kahn@cpan.org>COPYRIGHT AND LICENSECopyright 2003-2008 by Jeremy Gillmor Kahn with additional support and ideas from Bill McNeillThis library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. POD ERRORSHey! The above document had some coding errors, which are explained below:
Visit the GSP FreeBSD Man Page Interface. |