Interacting with Perl and Shell

Along with XPath, Perl is one of two XSH2 expression languages, and borrows XSH2 its great expressive power. Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It has built-in regular expressions and powerful yet easy to learn data structures (scalars, arrays, hash tables). It's also a good language for many system management tasks. XSH2 itself is written in Perl (except for the XML engine, which uses libxml2 library written in C by Daniel Veillard).

Calling Perl

Perl expressions or blocks of code can either be used as arguments to any XSH2 command. One of them is perl command which simply evaluates the given Perl block. Other commands, such as map, even require Perl expression argument and allow quickly change DOM node content. Perl expressions may also provide lists of strings to iterate over with a foreach loop, or serve as conditions for if, unless, and while statements.

To prevent conflict between XSH2 internals and the evaluated Perl code, XSH2 runs such code in the context of a special namespace XML::XSH2::Map. As described in the section Variables, XSH2 string variables may be accessed and possibly assigned from Perl code in the most obvious way, since they actually are Perl variables defined in the XML::XSH2::Map namespace.

The interaction between XSH2 and Perl actually works the other way round as well, so that you may call back XSH2 from the evaluated Perl code. For this, Perl function xsh is defined in the XML::XSH2::Map namespace. All parameters passed to this function are interpreted as XSH2 commands.

Moreover, the following Perl helper functions are defined:

xsh(string,....) - evaluates given string(s) as XSH2 commands.

call(name) - call a given XSH2 subroutine.

count(string) - evaluates given string as an XPath expression and returns either literal value of the result (in case of boolean, string and float result type) or number of nodes in a returned node-set.

literal(string|object) - if passed a string, evaluates it as a XSH2 expression and returns the literal value of the result; if passed an object, returns literal value of the object. For example, literal('$doc/expression') returns the same value as count('string($doc/expression)').

serialize(string|object) - if passed a string, it first evaluates the string as a XSH2 expression to obtain a node-list object. Then it serializes the object into XML. The resulting string is equal to the output of the XSH2 command ls applied on the same expression or object expression only without indentation and folding.

type(string|object) - if passed a string, it first evaluates the string as XSH2 expression to obtain a node-list object. It returns a list of strings representing the types of nodes in the node-list (ordered in the canonical document order). The returned type strings are: element, attribute, text, cdata, pi, entity_reference, document, chunk, comment, namespace, unknown.

nodelist(string|object,...) - converts its arguments to objects if necessary and returns a node-list consisting of the objects.

xpath(string, node?) - evaluates a given string as an XPath expression in the context of a given node and returns the result.

echo(string,...) - prints given strings on XSH2 output. Note, that in the interactive mode, XSH2 redirects all output to a specific terminal file handle stored in the variable $OUT. So, if you for example mean to pipe the result to a shell command, you should avoid using STDOUT filehandle directly. You may either use the usual print without a filehandle, use the echo function, or use $OUT as a filehandle.

In the following examples we use Perl to populate the Middle-Earth with Hobbits whose names are read from a text file called hobbits.txt, unless there are some Hobbits in Middle-Earth already.

Example 7. Use Perl to read text files

unless (//creature[@race='hobbit']) {
  perl {
    open my $fh, "hobbits.txt" };
    @hobbits=<$file>;
    close $fh;
  }
  foreach { @hobbits } {
    copy xsh:new-element("creature","name",.,"race","hobbit")
      into m:/middle-earth/creatures;
  }
}

Example 8. The same code as a single Perl block

perl {
  unless (count(//creature[@race='hobbit'])) {
    open my $file, "hobbits.txt";
    foreach (<$file>) {
      xsh(qq{insert element "<creature name='$_' race='hobbit'>"
        into m:/middle-earth/creatures});
    }
    close $file;
  }
};

Writing your own XPath extension functions in Perl

XSH2 allows users to extend the set of XPath functions by providing extension functions written in Perl. This can be achieved using the register-function command. The perl code implementing an extension function works as a usual perl routine accepting its arguments in @_ and returning the result. The following conventions are used:

The arguments passed to the perl implementation by the XPath engine are simple scalars for string, boolean and float argument types and XML::LibXML::NodeList objects for node-set argument types. The implementation is responsible for checking the argument number and types. The implementation may use general Perl functions as well as XML::LibXML methods to process the arguments and return the result. Documentation for the XML::LibXML Perl module can be found for example at http://search.cpan.org/~pajas/XML-LibXML/.

Extension functions SHOULD NOT MODIFY the document DOM tree. Doing so could not only confuse the XPath engine but possibly even result in an critical error (such as segmentation fault). Calling XSH2 commands from extension function implementations is also dangerous and isn't generally recommended.

The extension function implementation must return a single value, which can be of one of the following types: simple scalar (a number or string), XML::LibXML::Boolean object reference (result is a boolean value), XML::LibXML::Literal object reference (result is a string), XML::LibXML::Number object reference (result is a float), XML::LibXML::Node (or derived) object reference (result is a node-set consisting of a single node), or XML::LibXML::NodeList (result is a node-set). For convenience, simple (non-blessed) array references consisting of XML::LibXML::Node objects can also be used for a node-set result instead of a XML::LibXML::NodeList.

Calling the System Shell

In the interactive mode, XSH2 interprets all lines starting with the exclamation mark (!) as shell commands and invokes the system shell to interpret the line (this is to mimic FTP and similar command-line interpreters).

xsh> !ls -l
-rw-rw-r--    1 pajas    pajas        6355 Mar 14 17:08 Artistic
drwxrwxr-x    2 pajas    users         128 Sep  1 10:09 CVS
-rw-r--r--    1 pajas    pajas       14859 Aug 26 15:19 ChangeLog
-rw-r--r--    1 pajas    pajas        2220 Mar 14 17:03 INSTALL
-rw-r--r--    1 pajas    pajas       18009 Jul 15 17:35 LICENSE
-rw-rw-r--    1 pajas    pajas         417 May  9 15:16 MANIFEST
-rw-rw-r--    1 pajas    pajas         126 May  9 15:16 MANIFEST.SKIP
-rw-r--r--    1 pajas    pajas       20424 Sep  1 11:04 Makefile
-rw-r--r--    1 pajas    pajas         914 Aug 26 14:32 Makefile.PL
-rw-r--r--    1 pajas    pajas        1910 Mar 14 17:17 README
-rw-r--r--    1 pajas    pajas         438 Aug 27 13:51 TODO
drwxrwxr-x    5 pajas    users         120 Jun 15 10:35 blib
drwxrwxr-x    3 pajas    users        1160 Sep  1 10:09 examples
drwxrwxr-x    4 pajas    users          96 Jun 15 10:35 lib
-rw-rw-r--    1 pajas    pajas           0 Sep  1 16:23 pm_to_blib
drwxrwxr-x    4 pajas    users         584 Sep  1 21:18 src
drwxrwxr-x    3 pajas    users         136 Sep  1 10:09 t
-rw-rw-r--    1 pajas    pajas          50 Jun 16 00:06 test
drwxrwxr-x    3 pajas    users         496 Sep  1 20:18 tools
-rwxr-xr-x    1 pajas    pajas        5104 Aug 30 17:08 xsh

To invoke a system shell command or program from the non-interactive mode or from a complex XSH2 construction, use the exec command.

Since UNIX shell commands are very powerful tool for processing textual data, XSH2 supports direct redirection of XSH2 commands output to system shell command. This is very similarly to the redirection known from UNIX shells, except that here, of course, the first command in the pipe-line colone is an XSH2 command. Since semicolon (;) is used in XSH2 to separate commands, it has to be prefixed with a backslash if it should be used for other purposes.

Example 9. Use grep and less to display context of `funny'

xsh> ls //chapter[5]/para | grep funny | less

Example 10. The same on Windows 2000/XP systems

xsh> ls //chapter[5]/para | find "funny" | more

Related topics

lcd

change system working directory

exec

execute a shell command

expression

expression argument type

hash

index selected nodes by some key value

map

transform node value/data using Perl or XPath expression

perl-code

in-line code in Perl programming language

perl

evaluate in-line Perl code

rename

quickly rename nodes with in-line Perl code