stream

Usage

stream input [FILE|PIPE|STRING] expression output [FILE|PIPE|STRING] expression select xpath command-block [ select xpath command-block ... ]

Description

EXPERIMENTAL! This command provides a memory efficient (though slower) way to process selected parts of an XML document with XSH. A streaming XML parser (SAX parser) is used to parse the input. The parser has two states which will be refered to as A and B below. The initial state of the parser is A.

In the state A, only a limited vertical portion of the DOM tree is built. All XML data comming from the input stream other than start-tags are immediatelly copied to the output stream. If a new start-tag of an element arrives, a new node is created in the tree. All siblings of the newly created node are removed. Thus, in the state A, there is exactly one node on every level of the tree. After a node is added to the tree, all the xpath expressions following the select keyword are checked. If none matches, the parser remains in state A and copies the start-tag to the output stream. Otherwise, the first expression that matches is remembered and the parser changes its state to B.

In state B the parser builds a complete DOM subtree of the element that was last added to the tree before the parser changed its state from A to B. No data are sent to the output at this stage. When the subtree is complete (i.e. the corresponding end-tag for its topmost element is encountered), the command-block of instructions following the xpath expression that matched is invoked with the root element of the subtree as the current context node. The commands in command-block are allowed to transform the whole element subtree or even to replace it with a different DOM subtree or subtrees. They must, however, preserve the element's parent as well as all its ancestor nodes intact. Failing to do so can result in an error or unpredictable results.

After the subtree processing command-block returns, all subtrees that now appear in the DOM tree in the place of the original subtree are serialized to the output stream. After that, they are deleted and the parser returns to state A.

Note that this type of processing highly limits the amount of information the XPath expressions can use. First notable fact is that elements can not be selected by their content. The only information present in the tree at the time of the XPath evaluation is the element's name and attributes plus the same information for all its ancestors. There is nothing known yet about possible child nodes of the element as well as of the node's position within its siblings.