"html_score" - Show complexity metric and other stats for web page
html_score [--html] [uri|file]
html_score http://perl.org
html_score --html http://perl6.org
Given a URI or a file name, treats its referent as HTML and prints a complexity
metric, the maximum element depth, and per-element statistics. The per-element
statistics appear in rows, one per tag name. For each tag name, its row
contains:
- The maximum nesting depth of elements with that tag name. This is
per-tag-name nesting depth, and does not take into account nesting within
other elements with other tag names.
- A count of the elements with that tag name in the document.
- The total number of characters in elements with that tag name. Characters
in nested elements are counted multiple times. For example, if a page
contains a table within a table, characters in the inner table will be
counted twice.
- The average size of elements with this tag name, in characters.
The argument to html_score can be either a URI or a file name. If
it starts with alphanumerics followed by a colon, it is treated as a URI.
Otherwise it is treated as file name. If the
"--html" option is specified, the output
is written as an HTML table.
The complexity metric is the average depth (or nesting level), in
elements, of a character, divided by the logarithm of the length of the
HTML. Whitespace and comments are ignored in calculating the complexity
metric. The division by the logarithm of the HTML length is based on the
idea that, all else being equal, it is reasonable for the nesting to
increase logarithmically as a web page grows in length.
Here is the first part of the output for
"http://perl.org".
http://perl.org
Complexity Score = 0.873
Maximum Depth = 12
Maximum Number of Size in Average
Element Nesting Elements Characters Size
a 1 56 3533 63
body 1 1 7615 7615
div 5 30 24695 823
em 1 1 13 13
h1 1 1 37 37
h4 1 11 559 50
With caution, the complexity metric can be used as a self-assessment of website
quality. Well designed websites often have low numbers, particularly if fast
loading is an important goal. But high values of the complexity metric do not
necessarily mean low quality. Everything depends on what the mission is, and
how well complexity is being used to serve the site's mission.
This program is a demo of a demo. It purpose is to show how easy it is to write
applications which look at the structure of web pages using Marpa::HTML. And
the purpose of Marpa::HTML is to demonstrate the power of its parse engine,
Marpa. Marpa::HTML was written in a few days, and its logic is a
straightforward, natural expression of the structure of HTML.
The starting template for this code was HTML::TokeParser, by Gisle Aas. See also
the acknowledgments for Marpa as a whole.
Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software
under the Perl license. For details see the LICENSE file in the Marpa
distribution.