XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (677 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

5.65Mb size Format: txt, pdf, ePub

Processing instructions are terminated with
>
rather than
?>
. Processing instructions are not often used in HTML, but the HTML 4.0 standard recommends that any vendor extensions should be implemented this way, rather than by adding element tags to the language. So it is possible they will be seen more frequently in the future.

Attributes that are conventionally written with a keyword only, and no value, will be recognized and output in this form. Common examples are
READONLY>
and
SELECTED>
. This is shorthand, permitted in SGML but not in XML, for an attribute that has only one permitted value, which is the same as the attribute name. In XML, these tags must be written as

and
SELECTED=“SELECTED”>
. The HTML output method will normally use the abbreviated form, as this is the only form that older HTML browsers will recognize.

The special use of the ampersand character in dynamic HTML attributes is recognized. For example, the tag

is correct HTML, though it would not be correct in XML, because of the ampersand character. To produce this output from a literal result element, the tag in the stylesheet would need to be written as

: note the double curly braces, to prevent them being interpreted with their special meaning in attribute value templates.

A common source of anxiety with HTML output is the use of ampersands in URLs. For example, suppose you want to generate the output:

Spanish Widgets

However you try to produce this using standard XSLT, the ampersand will always come out as
&
. The reason for this is simple:
&
, although commonly used and widely accepted, is not actually correct HTML, and according to the standard it must be escaped as
&
. All respectable browsers accept the correct escaped form, so the answer is: don't worry about it.

Although the serializer won't generally check that the result tree is valid HTML, there is one exception: it must not use characters that are allowed in XML but not in HTML, notably Unicode characters in the range x80 to x9F. If these characters appear in your XML, the chances are that they got there by accident. Microsoft's cp1252 character set (sometimes called
ANSI
) is generally similar to iso-8859-1 but uses codes in this range to refer to special characters such as the Euro currency symbol, dagger, em-dash, middle dot, and the trademark sign. If a document that uses these characters is correctly labeled, then these characters will be translated into their Unicode equivalents (for example the Euro sign will become x20AC), and all will be well. If, however, the document is wrongly labeled with
encoding=“iso-8859-1”
, then these characters will be represented in the XML with codes in the range x80 to x9F, which will cause an error when you try to serialize as HTML, because HTML does not allow characters in that range. The remedy is to change the XML declaration of the source document from
encoding=“iso-8859-1”
to
encoding=“cp1252”
.

Other books

Rocked Part 5: A New Adult Rockstar Romance (Billionaire's Obsession) by Wild, Bella, Love-Wins, Bella

Play it as it Lays by Joan Didion

Acts of Violence by Ross Harrison

The Avenger 32 - The Death Machine by Kenneth Robeson

Blaze: A Texas Heat Novel by McKenzie, Octavia

Wild Seed by Octavia E. Butler

Truth and Sparta by Camille Oster

Wanderlost by Jen Malone

The Last Fairy Tale by Lowell, E. S.

Breed of Havoc (The Breed Chronicles #3) by Lanie Jordan