XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (683 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
3.11Mb size Format: txt, pdf, ePub
  • Finally, there are some transformations where generating the correct result tree is really difficult, or really slow. An example might be where the document structure uses interleaved markup. This is used where there are two parallel hierarchies running through the same document; for example, one for the chapter/section/paragraph structure and one for the paginated layout. An expert will know when it's time to give up and cheat—which in this case means producing markup in the result document by direct intervention at the serialization stage, rather than generating the correct result tree and having the markup produced automatically by the serializer. The problem, of course, is that beginners are inclined to give up and cheat far too soon, which leads to code that is difficult to extend and maintain.

Choosing Characters to Map

Applications for character maps probably fall into two categories: those where you want to choose a nonstandard string representation of a character that occurs naturally in the data, and those where you want to choose some otherwise unused character to trigger some special effect in the output.

An example in the first category would be the example shown earlier:


   


This forces the nonbreaking space character to be output as an entity reference. If the document is to be edited, many people will find the entity reference easier to manipulate because it shows up as a visible character, whereas the nonbreaking space character itself appears on the screen just like an ordinary space.

An example in the second category is choosing two characters to represent the start and end of a comment. Suppose that the requirement is to transform an input document by “commenting out” any element that has the attribute
delete=“yes”
. By commenting out, I mean outputting something like:


This is tricky, because the result cannot be modeled naturally as a result tree—comment nodes cannot have element nodes as children. So we'll choose instead to output the

element to the result tree unchanged, but preceded and followed by special characters, which we will map during serialization to comment start and end delimiters.

The best characters to choose for such purposes are the characters in the Unicode Private Use Area, for example the characters from xE000 to xF8FF. These characters have no defined meaning in Unicode, and are intended to be used for communications where there is a private agreement between the sender and the recipient as to what they mean. In this case, the sender is the stylesheet and the recipient is the serializer.

If you assign private use characters in information that is passed between applications, especially applications owned by different organizations, you should make sure that your use of the characters is well documented.

Here is a stylesheet that performs the required transformation:

Example: Using a Character Map to Comment-Out Elements

This example copies the input unchanged to the output, except that any element in the input that has the attribute
delete=“yes”
is output within a comment.

Stylesheet

The stylesheet is
comment-out.xsl
:


  

  

]>

     xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>



  

  



  

    

    

  



  &start-comment;

  

  &end-comment;



Source

One of the paragraphs in the source file
resume.xml
is:

Aidan is also in demand as a consort singer,

performing with groups including the Oxford Camerata and the Sarum

Consort, with whom he has made several acclaimed recordings on the

ASV label of motets by Bach and Peter Philips sung by solo voices.


Output

When the stylesheet is applied to the source file
resume.xml
, the above paragraph appears as:


Limitations of Character Maps

A character map applies to a whole result document; you cannot switch character mapping on and off at will.

The character map must be fixed at compile time. You cannot compute the output string at runtime, and there is no way the process can be parameterized. (You can, however, substitute a different character map by having different definitions of the same character map in different stylesheet modules, and deciding which one to import using

.)

Character mapping may impose a performance penalty, especially if a large number of characters are mapped.

Character mapping has no effect unless the result of the transformation is actually serialized. If the result tree is passed straight to another application that doesn't understand the special characters, it is unlikely to have the desired effect.

Character mapping only affects the content of text and attribute nodes. It doesn't affect characters in element and attribute names, or markup characters such as the quotes around an attribute value.

The character to be mapped, and all the characters in the replacement string, must be valid XML characters. This is because there is no way of representing invalid characters in the

element in the stylesheet. This means that character maps cannot be used to generate text files containing characters not allowed in XML, such as the NUL character (x00).

Disable Output Escaping

XSLT 1.0 provided an alternative way of getting fine-grained control over the serializer, namely the
disable-output-escaping
attribute of the

and

instructions. This has been deprecated in XSLT 2.0, but it is still likely to be supported in many processors because it is so widely used (and abused) in XSLT 1.0 stylesheets.

Reasons to Disable Output Escaping

Normally, when you try to output a special character such as
<
or
&
in a text node, the special character will be escaped in the output file using the normal XML escaping mechanisms. The escaping is done by the serializer: the text node written in the result tree contains a
<
or
&
character, and the serializer translates this into
<
or
&
. The serializer is free to represent the special characters any way it wants; for example, it can write
<
as
<

Other books

Lives in Ruins by Marilyn Johnson
More Than Good Enough by Crissa-Jean Chappell
El tercer brazo by Jerry Pournelle Larry Niven
Destination by James Ellroy
Glasswrights' Progress by Mindy L Klasky