|
NAMEHTML::TableContentParser - Do interesting things with the contents of tables.SYNOPSISuse HTML::TableContentParser; my $p = HTML::TableContentParser->new(); my $html = read_html_from_somewhere(); my $tables = $p->parse_file( $html ); for my $t (@$tables) { for my $r (@{$t->{rows}}) { print 'Row:'; for my $c (@{$r->{cells}}) { print " [$c->{data}]"; } print "\n"; } } DESCRIPTIONThis package parses tables out of HTML. The return from the parse is a reference to an array containing the tables found.Tables appear in the output in the order in which they are encountered. If a table is nested inside a cell of another table, it will appear after the containing table in the output, and any connection between the two will be lost. As of version 0.200_01, the appearance of a nested table should not cause any truncation of the containing table. The following tags are processed by this module: "<table>", "<caption>", "<tr>", "<th>", and "<td>". In the return from the parse method, each tag is represented by a hash reference, having the tag's attributes as keys, and the attribute values as values. In addition, the following keys will be provided:
METHODSThis module is a subclass of HTML::Parser. It provides only one new method, classic(), which is an accessor for the attribute of the same name. The following inherited (or overridden) methods may profitably be called by the user.newmy $p = HTML::TableContentParser->new(); This static method instantiates the parser object. The only supported argument is
classicThis method returns the value of the "classic" attribute, whether specified or defaulted.parsemy $tables = $p->parse( $html ); This method parses the given HTML. The return is a reference to an array containing all the tables found. GLOBALSThe following global variables, properly localized, can be used to modify the behavior of this module.$HTML::TableContentParser::CLASSICThis variable provides the default value of the "classic" argument to new(), and is subject to the same restrictions.$HTML::TableContentParser::DEBUGIf set to 1, causes debug output to STDERR (via "warn()"). Setting this to any true value (including 1) is unsupported in the sense that the behavior of this module in response to any true value is explicitly undocumented, and can change without notice.EXPORTSNothing.CAVEATS, BUGS, and TODOThe "rowspan" and "colspan" attributes are reported but ignored. That is,<tr><td colspan="2">Moe</td><td>Howard</td></tr> occupies three columns in the HTML table, but only two entries are made in the "{cells}" value of the hash that represents this row. SEE ALSOThis module is a very specific tool to address a very specific problem. One of the following modules may better address your needs.HTML::Parser. This is a general HTML parser, which forms the basis for this module. HTML::TreeBuilder. This is a general HTML parser, with methods to search and traverse the parse tree once generated. Mojo::DOM in the Mojolicious distribution. This is a general HTML/XML DOM parser, with methods to search the parse tree using CSS selectors. AUTHORSimon Drabble <sdrabble@cpan.org>Thomas R. Wyant, III wyant at cpan dot org COPYRIGHT AND LICENSECopyright (C) 2002 Simon DrabbleCopyright (C) 2017-2018 Thomas R. Wyant, III This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
Visit the GSP FreeBSD Man Page Interface. |