|
NAME"html_fmt" - Reformat HTML, indented according to structureSYNOPSIShtml_fmt [uri|file] EXAMPLEhtml_fmt http://perl.org DESCRIPTIONGiven the URI or the name of a file, writes it to "STDOUT" reformatted and indented according to the HTML structure. Missing start and end tags are supplied and comments added to indicate this. Text inside "<pre>" elements is not altered.html_fmt tries to parse everything that is actually out there on the Web. In fact, html_fmt will assume any file fed to it was intended as HTML, and will produce its best guess of the author's intent. html_fmt supplies missing start and end tags. html_fmt's parser is extremely liberal in what it accepts. When its liberalization of the standards is not sufficient to make a document into valid HTML, html_fmt will pick characters to treat as noise or "cruft". The parser ignores cruft in determining the structure of the document. When html_fmt adds a missing start tag, it precedes the new start tag with a comment. When html_fmt adds a missing end tag, it follows the new end tag with a comment. When html_fmt classifies characters as "cruft", it adds a comment to that effect before the "cruft". "pre" elements receive special treatment. The contents of "pre" elements are not reformatted. When missing tags or cruft occur inside a "pre" element, the comments to that effect are placed before the "<pre>" start tag. The argument to html_score can be either as a URI or a file name. If it starts with alphanumerics followed by a colon, it is treated as a URI. Otherwise it is treated as file name. SAMPLE OUTPUTGiven this input:<title>Test page<tr>x<head attr="I am cruft"><p>Final graf html_fmt returns <!-- Following start tag is replacement for a missing one --> <html> <!-- Following start tag is replacement for a missing one --> <head> <title> Test page </title> <!-- Preceding end tag is replacement for a missing one --> </head> <!-- Preceding end tag is replacement for a missing one --> <!-- Following start tag is replacement for a missing one --> <body> <!-- Following start tag is replacement for a missing one --> <table> <!-- Following start tag is replacement for a missing one --> <tbody> <tr> <!-- Following start tag is replacement for a missing one --> <td> x <!-- Next line is cruft --> <head attr="I am cruft"> <p> Final graf </p> <!-- Preceding end tag is replacement for a missing one --> </td> <!-- Preceding end tag is replacement for a missing one --> </tr> <!-- Preceding end tag is replacement for a missing one --> </tbody> <!-- Preceding end tag is replacement for a missing one --> </table> <!-- Preceding end tag is replacement for a missing one --> </body> <!-- Preceding end tag is replacement for a missing one --> </html> <!-- Preceding end tag is replacement for a missing one --> PURPOSEThis program is a demo of a demo. It purpose is to show how easy it is to write applications which look at the structure of web pages using Marpa::HTML. And the purpose of Marpa::HTML is to demonstrate the power of its parse engine, Marpa. Marpa::HTML was written in a few days, and its logic is a straightforward, natural expression of the structure of HTML.ACKNOWLEDGMENTSThe starting template for this code was HTML::TokeParser, by Gisle Aas. See also the acknowledgments for Marpa as a whole.LICENSE AND COPYRIGHTCopyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.
Visit the GSP FreeBSD Man Page Interface. |