iterate

Usage

iterate xpath command-block

Description

Iterate works very much like the XPath variant of foreach, except that iterate evaluates the command-block as soon as a new node matching a given xpath is found. As a limitation, the xpath expresion used with iterate may only consist of one XPath step, i.e. it cannot contain an XPath step separator /.

What are the benefits of iterate over a foreach loop, then? Well, under some circumstances it is efficiency, under other there are none. To clarify this, we have to dive a bit deeper into the details of XPath implementation. By definition, the node-list resulting from evaluation of an XPath has to be ordered in the canonical document order. That means that an XPath implementation must contain some kind of a sorting algorithm. This would not itself be much trouble if a relative document order of two nodes of a DOM tree could be determined in a constant time. Unfortunately, the libxml2 library, used behind XSH, does not implement mechanisms that would allow this complexity restriction (which is, however, quite natural and reasonable approach if all the consequences are considered). Thus, when comparing two nodes, libxml2 traverses the tree to find their nearest common ancestor and at that point determines the relative order of the two subtrees by trying to seek one of them in a list of right siblings of the other. This of course cannot be handled in a constant time. As a result, the sorting algorithm, reasonably efficient for a constant time comparison (polynomial of a degree < 1.5) or small node-lists, becomes rather unusable for huge node-lists with linear time comparison (still polynomial but of a degree > 2).

The iterate command provides a way to avoid sorting the resulting nodelist by limiting allowed XPath expression to one step (and thus one axis) at a time. On the other hand, since iterate is implemented in Perl, a proxy object glueing the C and Perl layers has to be created for every node the iterator passes by. This (plus some extra subroutine calls) makes it about two to three times slower compared to a similar tree-traversing algorithm used by libxml2 itself during XPath evaluation.

Our experience shows that iterate beats foreach in performance on large node-lists (>=1500 nodes, but your milage may vary) while foreach wins on smaller node-lists.

The following two examples give equivallent results. However, the one using iterate may be faster esp. if the number of nodes being counted is very large.

Example 19. Count inhabitants of the kingdom of Rohan in productive age

cd rohan/inhabitants;
iterate child::*[@age>=18 and @age<60] { perl $productive++ };
echo "$productive inhabitants in productive age";

Example 20. Using XPath

$productive=count(rohan/inhabitants/*[@age>=18 and @age<60]);
echo "$productive inhabitants in productive age";

Use e.g. | time cut pipe-line redirection to benchmark a XSH command on a UNIX system.

See Also

foreach, next, prev, last