|
NAMEHTML::PrettyPrinter - generate nice HTML files from HTML syntax trees SYNOPSISuse HTML::TreeBuilder; # generate a HTML syntax tree my $tree = new HTML::TreeBuilder; $tree->parse_file($file_name); # modify the tree if you want use HTML::PrettyPrinter; my $hpp = new HTML::PrettyPrinter ('linelength' => 130, 'quote_attr' => 1); # configure $tree->address("0.1.0")->attr(_hpp_indent,0); # for an individual element $hpp->set_force_nl(1,qw(body head)); # for tags $hpp->set_force_nl(1,qw(@SECTIONS)); # as above $hpp->set_nl_inside(0,'default!'); # for all tags # format the source my $linearray_ref = $hpp->format($tree); print @$linearray_ref; # alternative: print directly to filehandle use FileHandle; my $fh = new FileHandel ">$filenaem2"; if (defined $fh) { $hpp->select($fh); $hpp->format(); undef $fh; $hpp->select(undef), } DESCRIPTIONHTML::PrettyPrinter produces nicely formatted HTML code from a HTML syntax tree. It is especially usefull if the produced HTML file shall be read or edited manually afterwards. Various parameters let you adapt the output to different styles and requirements.If you don't care how the HTML source looks like as long as it is valid and readable by browsers, you should use the as_HTML() method of HTML::Element instead of the pretty printer. It is about five times faster. The pretty printer will handle line wrapping, indention and structuring by the way the whitespace in the tree is represented in the output. Furthermore upper/lowercase markup and markup minimization, quoting of attribute values, the encoding of entities and the presence of optional end tags are configurable. There are two types of parameters to influence the output, individual parameters that are set on a per element and per tag basis and common parameters that are set only once for each instance of a pretty printer. In order to faciliate the configuration a mechanism to handle tag groups is provided. Thus, it is possible to modify a parameter for a group of tags (e.g. all known block elements) without writing each tag name explicitly. Perhaps the code for tag groups will move to an other Perl module in the future. For HTML::Elements that require a special treatment like <PRE>, <XMP>, <SCRIPT>, comments and declarations, pretty printer will fall back to the method "as_HTML()" of the HTML elements. INDIVIDUAL PARAMETERSFollowing individual paramters exist
Access MethodsFollowing access methods exist for each individual paramenter. Replace parameter by the respective name.
COMMON PARAMETERS
Access Method
OTHER METHODS
TAG GROUPSTag groups are lists that contain the names of tags and other tag groups which are considered as subsets. This reflects the way allowed content is specified in HTML DTDs, where e.g. %flow consists of all %block and %inline elements and %inline covers several subsets like %phrase.If you add a tag name to a group A, it will be seen in any group that contains group A. Thus, it is easy to maintain groups of tags with similar properties. (and configure HTML pretty printer for these tags). The names of tag groups are written in upper case letters with a leading '@' (e.g. '@BLOCK'). The names of simple tags are written all lower case. FunctionsAll the functions to handle and modify tag groups are included in the @EXPORT_OK list of "HTML::PrettyPrinter".
Predefined Tag GroupsThere are a couple of predefined tag groups. Use " foreach my $tg (list_groups()) { print "'$tg' => qw(".join(',',group_get($tg)).")\n"; } " to get a list.Examples for tag groups
EXAMPLEConsider the following HTML tree<html> @0 <head> @0.0 <title> @0.0.0 "Demonstrate HTML::PrettyPrinter" <body> @0.1 <h1> @0.1.0 "Headline" <p align="JUSTIFY"> @0.1.1 "Some text in " <b> @0.1.1.1 "bold" " and " <i> @0.1.1.3 "italics" " and with 'ä' & 'ü'." <table align="LEFT" border=0> @0.1.2 <tr> @0.1.2.0 <td align="RIGHT"> @0.1.2.0.0 "top right" <tr> @0.1.2.1 <td align="LEFT"> @0.1.2.1.0 "bottom left" <hr noshade="NOSHADE" size=5> @0.1.3 <address> @0.1.4 <a href="mailto:schotten@gmx.de"> @0.1.4.0 "Claus Schotten" and " $hpp = HTML::PrettyPrinter-"new('uppercase' => 1); print @{$hpp->format($tree)}; > will print <HTML><HEAD><TITLE>Demonstrate HTML::PrettyPrinter</TITLE></HEAD><BODY><H1>Headline</H1><P ALIGN=JUSTIFY>Some text in <B>bold</B> and <I>italics</I> and with 'ä' & 'ü'.</P><TABLE ALIGN=LEFT BORDER=0><TR><TD ALIGN=RIGHT>top right</TD></TR><TR><TD ALIGN=LEFT>bottom left</TD></TR></TABLE><HR NOSHADE SIZE=5 ><ADDRESS><A HREF="mailto:schotten@gmx.de" >Claus Schotten</A></ADDRESS></BODY></HTML> That doesn't look very nice. What went wrong? By default HTML::PrettyPrinter takes a conservative approach on whitespace. It will enlarge existing whitespace, but it will not introduce new whitespace outside of tags, because that might change the way a browser renders the HTML document. However the HTML tree was constructed with ""ignore_ignorable_whitespace> turned on. Thus, there is no whitespace between block elements that the pretty printer could format. So pretty printer does line wrapping and indention only. E.g. the title is in the third level of the tree. Thus, the second line is indented six characters. The table cells in the fifth level are indented by ten characters. Furthermore, you see that there is a whitespace inserted after the last attribute of the <A> tag. Let's set $hpp->allow_forced_nl(1);. Now the forced_nl parameters are enabled. By default, they are set for all non-inline tags. That creates <HTML> <HEAD> <TITLE>Demonstrate HTML::PrettyPrinter</TITLE> </HEAD> <BODY> <H1>Headline</H1> <P ALIGN=JUSTIFY>Some text in <B>bold</B> and <I>italics</I> and with 'ä' & 'ü'.</P> <TABLE ALIGN=LEFT BORDER=0> <TR> <TD ALIGN=RIGHT>top right</TD> </TR> <TR> <TD ALIGN=LEFT>bottom left</TD> </TR> </TABLE> <HR NOSHADE SIZE=5> <ADDRESS><A HREF="mailto:schotten@gmx.de" >Claus Schotten</A></ADDRESS> </BODY> </HTML> Much better, isn't it? Now let's improve the structuring. $hpp->set_nl_before(2,qw(body table)); $hpp->set_nl_after(2,qw(table)); will require two new lines in front of <body> and <table> tags and after <table> tags. <HTML> <HEAD> <TITLE>Demonstrate HTML::PrettyPrinter</TITLE> </HEAD> <BODY> <H1>Headline</H1> <P ALIGN=JUSTIFY>Some text in <B>bold</B> and <I>italics</I> and with 'ä' & 'ü'.</P> <TABLE ALIGN=LEFT BORDER=0> <TR> <TD ALIGN=RIGHT>top right</TD> </TR> <TR> <TD ALIGN=LEFT>bottom left</TD> </TR> </TABLE> <HR NOSHADE SIZE=5> <ADDRESS><A HREF="mailto:schotten@gmx.de" >Claus Schotten</A></ADDRESS> </BODY> </HTML> Currently the mail address is the only attribute value which is quoted. Here the quotes are required by the '@' character. For all other attribute values quotes are optional and thus ommited by default. $hpp->quote_attr(1); will turn the quotes on. $hpp->set_endtag(0,'all!') turns all optional endtags off. This affects the </p> (and should affect </tr> and </td>, see below). Alternatively, we could use $hpp->set_endtag(0,'default!'). That would turn the default off, too. But it wouldn't delete settings for individual tags that supersede the default. $hpp->set_nl_after(3,'head') requires three new lines after the <head> element. Because there are already two new lines required by the start of <body> only one additional line is added. $hpp->set_force_nl(0,'td') will inhibit the introduction of whitespace alround <td>. Thus, the table cells are now on the same line as the table rows. <HTML> <HEAD> <TITLE>Demonstrate HTML::PrettyPrinter</TITLE> </HEAD> <BODY> <H1>Headline</H1> <P ALIGN="JUSTIFY">Some text in <B>bold</B> and <I>italics</I> and with 'ä' & 'ü'. <TABLE ALIGN="LEFT" BORDER="0"> <TR><TD ALIGN="RIGHT">top right</TD></TR> <TR><TD ALIGN="LEFT">bottom left</TD></TR> </TABLE> <HR NOSHADE SIZE="5"> <ADDRESS><A HREF="mailto:schotten@gmx.de" >Claus Schotten</A></ADDRESS> </BODY> </HTML> The end tags </td> and </tr> are printed because HTML:Tagset says they are mandatory. map {$HTML::Tagset::optionalEndTag{$_}=1} qw(td tr th); will fix that. The additional new line after </head> doesn't look nice. With $hpp->set_nl_after(undef,'head') we will reset the parameter for the <head> tag. $hpp->entities($hpp->entities().'ä'); will enforce the entity encoding of 'ä'. $hpp->min_bool_attr(0); will inhibt the minimizyation of the NOSHADE attribute to <hr>. Let's fiddle with the indention: $hpp->set_indent(8,'@TEXTBLOCK'); $hpp->set_indent(0,'html'); New lines inside text blocks (here inside <h1>, <p> and <address>) will be indented by 8 characters instead of two, whereas the code directly under <html> will not be indented. <HTML> <HEAD> <TITLE>Demonstrate HTML::PrettyPrinter</TITLE> </HEAD> <BODY> <H1>Headline</H1> <P ALIGN="JUSTIFY">Some text in <B>bold</B> and <I>italics</I> and with 'ä' & 'ü'. <TABLE ALIGN="LEFT" BORDER="0"> <TR><TD ALIGN="RIGHT">top right <TR><TD ALIGN="LEFT">bottom left </TABLE> <HR NOSHADE="NOSHADE" SIZE="5"> <ADDRESS><A HREF="mailto:schotten@gmx.de" >Claus Schotten</A></ADDRESS> </BODY> </HTML> $hpp->wrap_at_tagend(HTML::PrettyPrinter::NEVER); will disable the line wrap between the attribute and the '>' of the <a> tag. The resulting line excedes the target line length by far, but the is no point left, where the pretty printer could legaly break this line. $hpp->set_endtag(1,'tr') will overwrite the default. Thus, the </tr> appears in the code whereas the other optional endtags are still omitted. Finally, we customize some individual elements:
<HTML> <HEAD> <TITLE>Demonstrate HTML::PrettyPrinter</TITLE> </HEAD> <BODY> <H1>Headline</H1> <TABLE ALIGN="LEFT" BORDER="0"> <TR><TD ALIGN="RIGHT">top right</TR> <TR> <TD ALIGN="LEFT">bottom left </TR> </TABLE> <HR NOSHADE="NOSHADE" SIZE="5"> <ADDRESS><A HREF="mailto:schotten@gmx.de">Claus Schotten</A></ADDRESS> </BODY> </HTML> KNOWN BUGS
SEE ALSOHTML::TreeBuilder, HTML::Element, HTML::TagsetCOPYRIGHTCopyright 2000 Claus Schotten schotten@gmx.deThis library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. AUTHORClaus Schotten <schotten@gmx.de>POD ERRORSHey! The above document had some coding errors, which are explained below:
Visit the GSP FreeBSD Man Page Interface. |