XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (26 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

2.98Mb size Format: txt, pdf, ePub

In general, the XML parser can't convert these values from lexical QNames to expanded QNames because it doesn't know that they are special. XML Schema has tried to address the problem by defining a data type
xs:QName
that declares the content of an element or attribute to be a QName, but this doesn't solve the whole problem, for a number of reasons:

There can be namespace-sensitive content other than simple QNames; for example, an attribute might contain an XPath expression, which is also namespace sensitive, but there is no schema-defined type for it.
There are documents that have no schema.
Although knowing the data type means that a schema processor can convert the lexical QName used in the string value of these attributes to the expanded QName used as the typed value, this only works if the schema processor knows the mapping of prefixes to namespace URIs. So if you want to be able to construct a tree and then pass it to a schema processor for validation, you need some way of representing the namespace information on the tree before this can work.
The definition of the
xs:QName
data type says that an unprefixed QName is assumed (like an unprefixed element name) to be in the default namespace. Unfortunately, at least one heavy user of QName-valued attributes, namely the XSLT specification, had already decided that an unprefixed QName (like an unprefixed attribute name) should be in the null namespace. This means that if the attribute were defined as an
xs:QName
, a schema processor would allocate the wrong namespace URI. So you will find that in the schema for XSLT 2.0 (the schema that can be used to validate XSLT stylesheets), the
xs:QName
data type isn't actually used.

So, namespace nodes exist primarily so that namespace prefixes appearing in namespace-sensitive content can be handled. Although this might seem a minor requirement, they cause significant complications.

The way namespace nodes are represented in the data model hasn't changed significantly between XPath 1.0 and XPath 2.0. What has changed, though, is that namespace nodes are now semi-hidden from the application. To be precise, the only way that you could actually get your hands on a namespace node in XPath 1.0 was by using the namespace axis; and in XPath 2.0, the namespace axis has been deprecated, which means that some implementations may continue to support it for backward-compatibility reasons, but they aren't required to. Instead, two functions have been provided,
in-scope-prefixes()
and
namespace-uri-for-prefix()
, that provide access to information about the namespaces that are in scope for any element. These functions are described in Chapter 13. The significance of this change is that it gives implementations the freedom to maintain namespace information internally in a form that is much more efficient than the formal description of namespace nodes in XDM would imply: remember that the data model is just a model, not a description of a real implementation.

As far as XSLT and XPath are concerned, don't worry too much about namespace nodes—all you need to know is that there are functions you can call to resolve namespace prefixes found in element or attribute content. When you construct new trees, however, understanding what namespace nodes are present on the new tree sometimes becomes more important.

IDs and IDREFs

An
ID
is a string value that identifies an element node uniquely within a document. If an element has an
ID
, it becomes easy and (one hopes) efficient to access that element if the
ID
value is known. Before XML Schemas came along, the
ID
always appeared as the value of an attribute declared in the DTD as being of type
ID
. XML Schema has retained this capability, but also allows the content of an element to be used as an ID value. This is done by declaring its type as
xs:ID
, which is a type derived by restriction from
xs:string
.

In XDM, every element has at most one
ID
value and (if the document is valid, which is not necessarily the case) every
ID
value identifies at most one element.

For example, in an XML dataset containing details of employees, each

element might have a unique
ssn
attribute giving the employee's Social Security number. For example:

John Doe

…

Jane Stagg

…

As the
ssn
attribute is unique, it can be declared in the DTD as an
ID
attribute using the following declaration:

Alternatively, an
ID
attribute can be declared in a schema:

…

More recently, a third way of defining
ID
attributes has been defined. Simply name the attribute
xml:id
, and it will automatically be recognized as an
ID
attribute. (Note, however, that if you validate your documents against a DTD or schema, it is still necessary to declare this as a permitted attribute name.)

An
ID
value is constrained to take the form of an XML
NCName
. This means, for example, that it must start with a letter, and that it must not contain characters such as
/
,
:
, or space.

Attributes can also be defined as being of type
IDREF
or
IDREFS
if they contain
ID
values used to point to other elements in the document (an
IDREF
attribute contains one
ID
value, an
IDREFS
attribute contains a whitespace-separated list of
ID
values). XPath provides a function,
id()
(see page 802), which can be used to locate an element given its
ID
value. This function is designed so that an
IDREF
or
IDREFS
attribute can be used as input to the function, but equally, so can any other string that happens to contain an
ID
. However,
IDREF
and
IDREFS
attributes are treated specially by the
idref()
function (see page 804), which follows
IDREF
links in the opposite direction—given an ID value, it finds all the nodes of type
IDREF
or
IDREFS
that refer to it.

There is a slight complication with the use of
ID
values, in that XPath is not constrained to process only valid XML documents. If an XML document is well formed (or merely well balanced) but not valid, then values that are supposed to be
ID
s may be duplicated, and they might not obey the syntactic rules for an XML
NCName
. Similarly, attributes might be marked as
IDREF
attributes, but actually contain broken links (values that don't match the
ID
of any element in the document). The XDM specification says that if an
ID
value appears more than once, all occurrences except the first are ignored. If the
ID
value contains invalid characters such as spaces, the
id()
function will fail to find the element but will otherwise appear to work correctly. If you use
ID
values, it's probably a good idea to use a validating XML parser to prevent this situation occurring.

XSLT offers another more flexible approach to finding elements (or other nodes) by content, namely keys. With keys you can do anything that
ID
s achieve, other than enforcing uniqueness. Keys are declared in the stylesheet using the

element, and they can be used to find a node by means of the
key()
function.

Other books

Discovery by T M Roy

Deadly Decisions by Kathy Reichs

More of Me by Samantha Chase

The Ogre Downstairs by Diana Wynne Jones

Heart Echoes by Sally John

Sapphire by Elayne Griffith

The Statement by Brian Moore

The Interrupted Tale by Maryrose Wood

Poetic Justice by Amanda Cross

Whisper Hollow by Chris Cander