XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (693 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

9.34Mb size Format: txt, pdf, ePub

The obvious difference between a fill-in-the-blanks stylesheet and this navigational stylesheet is that the

and

elements are now explicit, which makes it possible to introduce other top-level elements, such as

and global

elements. More subtly, the range of XSLT features used means that this stylesheet has crossed the boundary from being an HTML document with added control instructions, to being a real program. The boundary, though, is a rather fuzzy one, with no visa required to cross it, so many people who have learned to write simple fill-in-the-blanks stylesheets should be able, as they expand their knowledge, to progress to writing navigational stylesheets of this kind.

Although the use of flow-of-control instructions like

,

, and

gives such a stylesheet a procedural feel, it does not violate the original concept that XSLT should be a declarative language. This is because the instructions do not have to be executed in the order they are written—variables can't be updated, so the result of one instruction can't affect the next one. For example, it's easy to think of the

instruction in this example processing the selected nodes in document order and adding them one by one to the result tree, but it would be equally valid for an XSLT processor to process them in reverse order, or in parallel, as long as the nodes are added to the result tree in the right place. That's why I was careful to call this design pattern
navigational
rather than
procedural
. It's navigational in that you say exactly where to find the nodes in the source tree that you want to visit, but it's not procedural, because you don't define the order in which you will visit them.

New features available in XSLT 2.0 and XPath 1.0 greatly increase the scope of what can be achieved with a navigational stylesheet. Many problems that in XSLT 1.0 required complex programming (using the computational design pattern described later in this chapter) can now be tacked within the navigational approach. Examples include grouping problems, and problems that require splitting up of text fields, using delimiters such as commas or newlines, as well as many arithmetic operations such as summing the total value of an invoice. The features that provide this capability include the following:

The availability of sequences in the data model, together with the
for
expression in XPath, to manipulate them.
Grouping constructs, including the

instruction in XSLT 2.0 and the
distinct-values()
function in XPath 2.0.
Text manipulation facilities, notably the

instruction in XSLT 2.0 and the
replace()
and
tokenize()
functions in XPath 2.0.
Aggregation functions such as
avg()
,
min()
, and
max()
.

The ability to write chunks of reusable code in the form of stylesheet functions that can be invoked from XPath expressions, rather than only as templates to be called using XSLT instructions, also helps to make navigational stylesheets much easier to write.

You will often see this kind of code criticized by experts because it doesn't take advantage of XSLT's most powerful feature, the ability to write template rules. The criticism is justified when you use this design pattern inappropriately, in a situation where a rule-based stylesheet would be better. But there are many simple problems where, in my view, this pattern works perfectly well. It produces code that is readable and efficient.

Rule-Based Stylesheets

A rule-based stylesheet consists primarily of rules describing how different features of the source document should be processed, such as “if you find a

element, display it in italic.”

Some would say that this rule-based approach is the essence of the XSLT language, the principal way that it is intended to be used. I would say that it's one way of writing stylesheets, often the best way, but not the only way, and not necessarily the best answer in every situation. It's often strongly recommended in books for beginners, but I think that the main reason for this is that for many beginners the navigational pattern is what comes naturally, because it has a very similar feel to programs written in procedural languages. It's important that every XSLT programmer be comfortable with writing rule-based stylesheets, so it makes sense to teach this approach early on.

Unlike navigational stylesheets, a rule-based stylesheet is not structured according to the desired output layout. In fact, it makes minimal assumptions about the structure of either the source document or the result document. Rather, the structure reads like an inventory of components that might be encountered in the source document, arranged in arbitrary order.

Rule-based stylesheets are therefore most useful when processing source documents whose structure is flexible or unpredictable, or which may change a lot in the future. It is very useful when the same repertoire of elements can appear in many different document structures, so a rule like “display dates in the format
23 March 2008
” can be reused in many different contexts.

Rule-based stylesheets are a natural evolution of CSS. In CSS, you can define rules of the form “for this class of elements, use this display rendition.” In XSLT, the rules become much more flexible, in two directions: the pattern language for defining which elements you are talking about is much richer, and the actions you can define when the rule is fired are vastly more wide-ranging.

A simple rule-based stylesheet consists of one rule for each element name. The typical rule matches a particular element name, outputs an HTML tag to define the rendition of that element, and calls

to process the child nodes of the element. This causes text nodes within the element to be copied to the output and nested child elements to be processed each according to its own template rule. In its simplest form, a rule-based stylesheet often contains many rules of the form:

This simple rule does a direct replacement of

tags by

tags. Most real stylesheets do something a bit more elaborate with some of the tags, but they may still contain many rules that are as simple as this one.

XSLT 2.0 introduces the ability to define rules that match elements and attributes by their type, as defined in a schema, rather than simply by their name or context. This makes the technique even more powerful when handling document structures that are highly complex or extensible. For example, you can match all the elements in a particular substitution group with a single template rule, which means that the stylesheet doesn't need to change when new elements are added to the substitution group later. Similarly, you can define a template rule that formats all elements containing part numbers or dates, irrespective of the element or attribute name.

Example: A Rule-Based Stylesheet

Rule-based stylesheets are often used to process narrative documents, where most of the processing consists in replacing XML tags by HTML tags. This example illustrates this by showing how a Shakespeare play can be rendered in HTML.

Input

The input
scene2.xml
is a scene from a play; Act I, Scene 2 of Shakespeare's
Othello
. It starts like this:

SCENE II. Another street.

Enter OTHELLO, IAGO, and Attendants with

torches

IAGO

Though in the trade of war I have slain men,

Yet do I hold it very stuff o’ the conscience

To do no contrived murder: I lack iniquity

Sometimes to do me service: nine or ten times

I had thought to have yerk'd him here under the ribs.

OTHELLO

‘Tis better as it is.

There are some complications that aren't shown in this sample, but which the stylesheet needs to take account of.

The top-level element is not always a

; it might also be a

or

. The

element (representing a stage direction) can appear at any level of nesting; for example, a stage directive can appear between two speeches, between two lines of a speech, or in the middle of a line.

Several people can speak at the same time. In this case a single

element has more than one

. In general, a

consists of one or more

elements followed by any number of

and

elements in any order.

Stylesheet

The stylesheet
scene.xsl
consists of a number of template rules. It starts by declaring a global variable (used simply as a constant) and a rule for the document element.

xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”

version=“1.0”>

<xsl:value-of select=“TITLE”/>

h1 {text-align:center}

h2 {text-align:center; font-size:120%; margin-top:12; margin-bottom:12}

body {background-color: }

div.speech {float:left; width:100%; padding:0; margin-top:6}

div.speaker {float:left; width:160;}

div.text {float:left}

Note the use of CSS to achieve the detailed styling. There's nothing inconsistent about using XSLT and CSS in combination in this way. You can also generate a reference to an external CSS stylesheet, but the advantage of generating it inline is that the content can be parameterized.

The appearance of

is a rare departure from the purely rule-based pattern, just to prove that none of the patterns has to be used to the exclusion of the others.

The template rule for the

element outputs the names of the speakers on the left, and the lines of the speech, plus any stage directives, on the right. Rather than using HTML tables, which is often discouraged for accessibility reasons, we use CSS classes for the actual positioning as follows:

The remaining template rules are straightforward. Each of them simply outputs the text of the element, using an appropriate HTML rendition. The only complication, which doesn't actually occur in this particular scene, is that for some elements (

and

) the HTML rendition is different, depending on the element's context, and so there is more than one rule defined for these elements.