XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (67 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
3.47Mb size Format: txt, pdf, ePub

Despite this clarification of the rules, I wouldn't normally recommend using the
xml:space
attribute in a stylesheet, but if there are large chunks of existing XML that you want to copy into the stylesheet verbatim, the technique can be useful.

Solving Whitespace Problems

There are two typical problems with whitespace in the output: too much of it, or too little.

If you are generating HTML, a bit of extra whitespace usually doesn't matter, though there are some places where it can slightly distort the layout of your page. With some text formats, however (a classic example is comma-separated values) you need to be very careful to output whitespace in exactly the right places.

Too Much Whitespace

If you are getting too much whitespace, there are three possible places it can be coming from:

  • The source document
  • The stylesheet
  • Output indentation

First ensure that you set
indent = “no”
on the

element, to eliminate the last of these possibilities.

If the output whitespace is adjacent to text, then it probably comes from the same place as that text.

  • If this text comes from the stylesheet, use

    to control more precisely what is output. For example, the following code outputs a comma between the items in a list, but it also outputs a newline after the comma, because the newline is part of the same text node as the comma:


   ,


  • If you want the comma but not the newline, change this so that the newline is in a text node of its own, and is therefore stripped.


   ,


  • If the text comes from the source document, use
    normalize-space()
    to trim leading and trailing spaces from the text before outputting it.

If the offending whitespace is between tags in the output, then it probably comes from whitespace nodes in the source tree that have not been stripped, and the remedy is to add an

element to the stylesheet.

Too Little Whitespace

If you want whitespace in the output and aren't getting it, use

to generate it at the appropriate point. For example, the following code will output the lines of a poem in HTML, with each line of the poem being shown on a new line:


   


This will display perfectly correctly in the browser, but if you want to view the HTML in a text editor, it will be difficult because everything goes on a single line. It would be useful to start a newline after each


element—you can do this as follows:


   


Another trick I have used to achieve this is to exploit the fact that the non-breaking-space character (
#
xA0), although invisible, is not classified as whitespace. So you can achieve the required effect by writing:


   
 


This works because the newline after the
 
is now part of a non-whitespace node.

Summary

The purpose of this chapter was to study the overall structure of a stylesheet, before going into the detailed specification of each element in Chapter 5. We've now covered the following:

  • How a stylesheet program can be made up of one or more stylesheet modules, linked together with

    and

    declarations. I described how the concept of import precedence allows one stylesheet to override definitions in those it imports.
  • The

    (or

    ) element, which is the outermost element of most stylesheet modules.
  • The

    processing instruction, which can be used to link from a source document to its associated stylesheets, and which allows a stylesheet to be embedded directly in the source document whose style it defines.
  • The declarations found in the stylesheet, that is, the immediate children of the

    or

    element, including the ability to have user-defined or vendor-defined elements here.
  • How the

    and

    elements can be omitted to make an XSLT stylesheet look more like the simple template languages that some users may be familiar with.
  • The idea of a sequence constructor, a structure that occurs throughout a stylesheet, which is a sequence containing text nodes and literal result elements to be copied to the result tree, and instructions and extension elements to be executed. This led naturally to a discussion of literal result elements and of attribute value templates, which are used to define variable attributes not only of literal result elements but of certain XSLT elements as well.
  • How the W3 C standards committee has tried to ensure that the specification can be extended, both by vendors and by W3 C itself, without adversely affecting the portability of stylesheets. You saw how to make a stylesheet work even if it uses proprietary extension functions and extension elements that may not be available in all implementations.
  • How XSLT stylesheets handle whitespace in the source document, in the stylesheet itself and in the result tree.

The next chapter describes how to use XSLT stylesheets together with an XML Schema for the source and/or result documents. If you are not interested in using schemas, you can probably skip that chapter and move straight to Chapter 5, which gives detailed information about the data types available in the XDM model and the ways in which you can use them.

Chapter 4

Stylesheets and Schemas

One of the most important innovations in XSLT 2.0 is that stylesheets can take advantage of the schemas you have defined for your input and output documents. This chapter explores how this works.

This feature is an optional part of XSLT 2.0, in two significant ways:

  • Firstly, an XSLT 2.0 processor isn't required to implement this part of the standard. A processor that offers schema support is called a
    schema-aware processor
    ; one that does not is referred to as a
    basic processor
    .
  • Secondly, even if the XSLT 2.0 processor you are using is a schema-aware processor, you can still process input documents, and produce output documents, for which there is no schema available.

There is no space in this book for a complete description of XML Schema. If you want to start writing schemas, I would recommend you read
XML Schema
by Eric van der Vlist (O'Reilly & Associates, 2002) or
Definitive XML Schema
by Priscilla Walmsley (Prentice Hall, 2002). XML Schema is a large and complicated specification, certainly as large as XSLT itself. However, it's possible that you are not writing your own schemas, but writing stylesheets designed to work with a schema that someone else has already written. If this is the case, I hope you will find the short overview of XML Schema in this chapter a useful introduction.

XML Schema: An Overview

The primary purpose of an XML Schema is to enable documents to be validated: they define a set of rules that XML documents must conform to, and enable documents to be checked against these rules. This means that organizations using XML to exchange invoices and purchase orders can agree on a schema defining the rules for these messages, and both parties can validate the messages against the schema to ensure that they are right. So the schema, in effect, defines a type of document, and this is why schemas are central to the type system of XSLT.

In fact, the designers of XML Schema were more ambitious than this. They realized that rather than simply giving a “yes” or “no” answer, processing a document against a schema could make the application's life easier by attaching labels to the validated document indicating, for each element and attribute in the document, which schema definitions it was validated against. In the language of XML Schema, this document with validation labels is called a Post Schema Validation Infoset, or PSVI. The XDM data model used by XSLT and XPath is based on the PSVI, but it only retains a subset of the information in the PSVI; most importantly, the type annotations attached to element and attribute nodes.

We begin by looking at the kinds of types that can be defined in XML Schema, starting with simple types and moving on to progressively more complex types.

Simple Type Definitions

Let's suppose that many of your messages refer to part numbers, and that part numbers have a particular format such as ABC12345. You can start by defining this as a type in the schema:


  

    

  


Part number is a simple type because it doesn't have any internal node structure (that is, it doesn't contain any elements or attributes). I have defined it by restriction from
xs:token
, which is one of the built-in types that come for free with XML Schema. I could have chosen to base the type on
xs:string
, but
xs:token
is probably better because with
xs:string
, leading and trailing whitespace is considered significant, whereas with
xs:token
, it gets stripped automatically before the validation takes place. The particular restriction in this case is that the value must match the regular expression given in the

element. This regular expression says that the value must consist of exactly three letters in the range A to Z, followed by exactly five digits.

Having defined this type, you can now refer to it in definitions of elements and attributes. For example, you can define the element:


This allows documents to contain

elements whose content conforms to the rules for the type called
part-number
. Of course, you can also define other elements that have the same type, for example:


Note the distinction between the name of an element and its type. Many element declarations in a schema (declarations that define elements with different names) can refer to the same type definition, if the rules for validating their content are the same. It's also permitted, though I won't go into the detail just yet, to use the same element name at different places within a document with different type definitions.

You can also use the same type definition in an attribute, for example:


You can declare variables and parameters in a stylesheet whose values must be elements or attributes of a particular type. Once a document has been validated using this schema, elements that have been validated against the declarations of
part
and
subpart
given above, and attributes that have been validated against the declaration named
part-nr
, will carry the type annotation
part-number
, and they can be assigned to variables such as:



The variable
part1
is allowed to contain any element node that has the type annotation
part-number
. If further types have been defined as restricted subtypes of
part-number
(for example,
Boeing-part-number
), these can be assigned to the variable too. The
*
indicates that we are not concerned with the name of the element or attribute, but only with its type.

There are actually three
varieties
of simple types that you can define in XML Schema: atomic types, list types, and union types. Atomic types are treated specially in the XPath/XSLT type system, because values of an atomic type (called, naturally enough, atomic values) can be manipulated as freestanding items, independently of any node. Like integers, booleans, and strings, part numbers as defined above are atomic values, and you can hold a part number or a sequence of part numbers directly in a variable, without creating any node to contain it. For example, the following declaration defines a variable whose value is a sequence of three part numbers:

              select=“for $p in (‘WZH94623’, ‘BYF67253’, ‘PRG83692’)

                      return $p cast as part-number”/>

Simple types in XML Schema are not the same thing as atomic types in the XPath data model. This is because a simple type can also allow a sequence of values. For example, it is possible to define the following simple type:


  

    

      

        

        

        

        

        

        

        

      

    

  


There are actually two type definitions here. The inner type is anonymous, because the

element has no
name
attribute. It defines an atomic value, which must be an
xs:NCName
, and more specifically, must be one of the values
red
,
orange
,
yellow
,
green
,
blue
,
indigo
, or
violet
. The outer type is a named type (which means it can be referenced from elsewhere in the schema), and it defines a list type whose individual items must conform to the inner type.

Other books

Damaged and the Knight by Bijou Hunter
My Soul to Take by Amy Sumida
My Love Betrayed by April Lynn Kihlstrom
Edward's Eyes by Patricia MacLachlan
Enchanted Heart by Felicia Mason
Comanche Woman by Joan Johnston
Hour Game by David Baldacci