![]() |
![]()
| ![]() |
![]()
NAMEHTML::Element::traverse - discussion of HTML::Element's traverse methodVERSIONThis document describes version 5.07 of HTML::Element::traverse, released August 31, 2017 as part of HTML-Tree.SYNOPSIS# $element->traverse is unnecessary and obscure. # Don't use it in new code. DESCRIPTION"HTML::Element" provides a method "traverse" that traverses the tree and calls user-specified callbacks for each node, in pre- or post-order. However, use of the method is quite superfluous: if you want to recursively visit every node in the tree, it's almost always simpler to write a subroutine does just that, than it is to bundle up the pre- and/or post-order code in callbacks for the "traverse" method.EXAMPLESSuppose you want to traverse at/under a node $tree and give elements an 'id' attribute unless they already have one.You can use the "traverse" method: { my $counter = 'x0000'; $start_node->traverse( [ # Callbacks; # pre-order callback: sub { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); return HTML::Element::OK; # keep traversing }, # post-order callback: undef ], 1, # don't call the callbacks for text nodes ); } or you can just be simple and clear (and not have to understand the calling format for "traverse") by writing a sub that traverses the tree by just calling itself: { my $counter = 'x0000'; sub give_id { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { give_id($c) if ref $c; # ignore text nodes } }; give_id($start_node); } See, isn't that nice and clear? But, if you really need to know: THE TRAVERSE METHODThe "traverse()" method is a general object-method for traversing a tree or subtree and calling user-specified callbacks. It accepts the following syntaxes:
These all mean to traverse the element and all of its children. That is, this method starts at node $h, "pre-order visits" $h, traverses its children, and then will "post-order visit" $h. "Visiting" means that the callback routine is called, with these arguments: $_[0] : the node (element or text segment), $_[1] : a startflag, and $_[2] : the depth If the $ignore_text parameter is given and true, then the pre-order call will not be happen for text content. The startflag is 1 when we enter a node (i.e., in pre-order calls) and 0 when we leave the node (in post-order calls). Note, however, that post-order calls don't happen for nodes that are text segments or are elements that are prototypically empty (like "br", "hr", etc.). If we visit text nodes (i.e., unless $ignore_text is given and true), then when text nodes are visited, we will also pass two extra arguments to the callback: $_[3] : the element that's the parent of this text node $_[4] : the index of this text node in its parent's content list Note that you can specify that the pre-order routine can be a different routine from the post-order one: $h->traverse( [\&pre_callback,\&post_callback], ...); You can also specify that no post-order calls are to be made, by providing a false value as the post-order routine: $h->traverse([ \&pre_callback,0 ], ...); And similarly for suppressing pre-order callbacks: $h->traverse([ 0,\&post_callback ], ...); Note that these two syntaxes specify the same operation: $h->traverse([\&foo,\&foo], ...); $h->traverse( \&foo , ...); The return values from calls to your pre- or post-order routines are significant, and are used to control recursion into the tree. These are the values you can return, listed in descending order of my estimation of their usefulness:
Almost every task to do with extracting information from a tree can be expressed in terms of traverse operations (usually in only one pass, and usually paying attention to only pre-order, or to only post-order), or operations based on traversing. (In fact, many of the other methods in this class are basically calls to traverse() with particular arguments.) The source code for HTML::Element and HTML::TreeBuilder contain several examples of the use of the "traverse" method to gather information about the content of trees and subtrees. (Note: you should not change the structure of a tree while you are traversing it.) [End of documentation for the "traverse()" method] Traversing with Recursive Anonymous RoutinesNow, if you've been reading Structure and Interpretation of Computer Programs too much, maybe you even want a recursive lambda. Go ahead:{ my $counter = 'x0000'; my $give_id; $give_id = sub { my $x = $_[0]; $x->attr('id', $counter++) unless defined $x->attr('id'); foreach my $c ($x->content_list) { $give_id->($c) if ref $c; # ignore text nodes } }; $give_id->($start_node); undef $give_id; } It's a bit nutty, and it's still more concise than a call to the "traverse" method! It is left as an exercise to the reader to figure out how to do the same thing without using a $give_id symbol at all. It is also left as an exercise to the reader to figure out why I undefine $give_id, above; and why I could achieved the same effect with any of: $give_id = 'I like pie!'; # or... $give_id = []; # or even; $give_id = sub { print "Mmmm pie!\n" }; But not: $give_id = sub { print "I'm $give_id and I like pie!\n" }; # nor... $give_id = \$give_id; # nor... $give_id = { 'pie' => \$give_id, 'mode' => 'a la' }; Doing Recursive Things IterativelyNote that you may at times see an iterative implementation of pre-order traversal, like so:{ my @to_do = ($tree); # start-node while(@to_do) { my $this = shift @to_do; # "Visit" the node: $this->attr('id', $counter++) unless defined $this->attr('id'); unshift @to_do, grep ref $_, $this->content_list; # Put children on the stack -- they'll be visited next } } This can under certain circumstances be more efficient than just a normal recursive routine, but at the cost of being rather obscure. It gains efficiency by avoiding the overhead of function-calling, but since there are several method dispatches however you do it (to "attr" and "content_list"), the overhead for a simple function call is insignificant. Pruning and WhatnotThe "traverse" method does have the fairly neat features of the "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals. None of these can be implemented totally straightforwardly with recursive routines, but it is quite possible. "ABORT"-like behavior can be implemented either with using non-local returning with "eval"/"die":my $died_on; # if you need to know where... sub thing { ... visits $_[0]... ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ... ... else call thing($child) for each child... ...any post-order visiting $_[0]... } eval { thing($node) }; if($@) { if($@ =~ m<^ABORT_TRAV>) { ...it died (aborted) on $died_on... } else { die $@; # some REAL error happened } } or you can just do it with flags: my($abort_flag, $died_on); sub thing { ... visits $_[0]... ... maybe set $abort_flag = 1; $died_on = $_[0]; return; foreach my $c ($_[0]->content_list) { thing($c); return if $abort_flag; } ...any post-order visiting $_[0]... return; } $abort_flag = $died_on = undef; thing($node); ...if defined $abort_flag, it died on $died_on SEE ALSOHTML::ElementAUTHORCurrent maintainers:
Original HTML-Tree author:
Former maintainers:
You can follow or contribute to HTML-Tree's development at <https://github.com/kentfredric/HTML-Tree>. COPYRIGHTCopyright 2000,2001 Sean M. Burke
|