XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (22 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

10.24Mb size Format: txt, pdf, ePub

If you have processed the document using a schema, things get more interesting. The situation where the typed value is most useful is where the schema defines a simple type for the element or attribute (or in the case of elements, a complex type with simple content—which means that the element can have attributes, but it cannot have child elements). As we will see in Chapter 4, simple types in XML Schema allow atomic values or lists of atomic values, but they don't allow child elements.

The simple type may be an atomic type, such as
xs:integer
or
xs:date
, in which case the typed value will be the result of converting the string value to an
xs:integer
or
xs:date
value according to the rules defined by XML Schema. The value must be a valid
xs:integer
or
xs:date
, or it wouldn't have passed schema validation.
The schema may also define the type as being a list; for example, a list of
xs:integer
or
xs:date
values. In this case the typed value is a sequence of zero or more atomic values, again following the rules defined in XML Schema.
Another possibility is that the schema defines a union type; for example, it may allow either an
xs:integer
or an
xs:date
. The schema validator tries to interpret the value as an
xs:integer
(if that is the first possibility listed), and if that fails, it tries to validate it as an
xs:date
. The typed value returned by the
data()
function may then be either an
xs:integer
or an
xs:date
value.
Lists of a union type are also allowed, so you can get back a sequence containing (say) a mixture of integers and dates.

For attributes, all types are simple types, so the above rules cover all the possibilities. For elements, however, there are additional rules to cover non-simple types:

If the schema defines the element as having mixed content, then the typed value is the same as the string value, labeled as
xs:untypedAtomic
. Note that the deciding factor is that the schema allows mixed content (a mixture of element and text node children), not that the element in question actually has mixed content: in reality it might have element children, or text children, or both or neither. This is identical to the rule for processing without a schema, which means that in many cases, narrative or document-oriented XML (as opposed to data-oriented XML) will be processed in exactly the same way whether there is a schema or not. Narrative XML is characterized by heavy use of mixed content models.
If the schema defines the element as having empty content (that is, the element is not allowed to have either element nodes or text nodes as children, though it can have attributes), then the typed value is an empty sequence.
If the schema defines the element as having an element-only content model (that is, it can contain element nodes as children but not text nodes), then there is no typed value defined, and attempting to retrieve the typed value causes an error. This error is classified as a type error, which means it may be detected and reported either at compile time or at evaluation time. The reason that this is an error is that the typed value must always be a sequence of atomic values, and there is really no way of doing justice to the content of a structured element by representing it as such a sequence. The content is not atomic, because it only makes sense when considered in conjunction with the names of the child elements. Element-only content models tend to feature strongly in “data-oriented” XML applications.

The Type Annotation of a Node

As well as having a typed value, a node also has a type annotation. This is a reference to the type definition that was used to validate the node during schema processing. It is not available directly to applications, but it affects the outcome of a number of type-sensitive operations. For example, when you select all attributes of type
xs:date
by writing the path expression
//attribute(*,
xs:date)
(this is described in Chapter 11), the system looks at the type annotations of the attributes to see which nodes qualify.

In the XDM specification, the type annotation is modeled as an
xs:QName
holding the name of the type in the case where the type is a globally declared schema type, or an invented name in the case where it is locally declared (not all types defined in a schema need to be named). It's reasonable to treat this as polite fiction, designed to tie up loose ends in the specification in an area where the practical details will inevitably vary from one implementation to another. Any real schema-aware XPath processor will need to have some kind of access to schema information both at compile time and at runtime, but the W3C specifications have not tried to model exactly what this should look like. In practice, the type annotation on a node is likely to be implemented as some kind of pointer into the metadata representing the cached schema information. But for defining the semantics of constructs like
//attribute(*,
xs:date)
, it's enough to assume that the node contains just the type name.

The type annotation defines the type of the content of the node, not the type of the node itself. This is an important distinction, and we'll have more to say about it when we discuss the XPath type system in Chapter 5.

You might imagine that the type annotation is redundant, because the typed value is itself an atomic value, and the atomic value itself has a label identifying its type. Very often, the type annotation of the node will be the same as the label on its typed value. However, this only works for nodes whose typed value is a single atomic value. In cases where the schema type is a list type, or a union type, the type annotation on the node is the name of the list or union type, which is not the same as the type of the individual atomic values making up the typed value. For example, if the schema type of an attribute is
xs:IDREFS
(which is defined as a list of
xs:IDREF
values), then the type annotation on the attribute node will be
xs:IDREFS
, but the items in the typed value will be labeled
xs:IDREF
. If the typed value is an empty sequence, there will be no items to carry a label, but the containing node can still be annotated as being of type
xs:IDREFS
. Similarly, if the schema type of an element allows attributes as well as integer content, the typed value will be labeled as
xs:integer
, while the element node itself will have a type annotation that refers to the name of a complex type in the schema.

There is, however, a strong relationship between the string value, the typed value, and the type annotation. In fact, with knowledge of the schema and access to a schema validator, the typed value can always be reconstructed from the string value and the type annotation.

If an element or attribute node has not been validated using a schema processor, then the type annotation will be
xs:untypedAtomic
in the case of an attribute node, or
xs:untyped
in the case of an element node.

For document, comment, processing-instruction, and namespace nodes, there is no type annotation (the value of the type annotation is an empty sequence). For text nodes, the type annotation is
xs:untypedAtomic
(but there is nothing in the language that makes use of this fact).

The Base URI of a Node

A node has a base URI. This should not be confused with its namespace URI. The base URI of a node depends on the URI of the source XML document it was loaded from, or more accurately, the URI of the external entity it was loaded from, since different parts of the same document might come from different XML entities. The base URI is used when evaluating a relative URI reference that occurs as part of the value of this node, for example an
href
attribute: this is always interpreted relative to the base URI of the node it came from.

It is possible to override this by specifying an explicit base URI using the
xml:base
attribute. For example, if an element has the attribute
xml:base=“../index.xml”
, then the base URI for this element, and for all its descendants provided they are in the same XML external entity, is the
index.xml
file in the parent directory of the file that would otherwise have provided the base URI.

Other books

Yesterday, Today, and Forever by Maria Von Trapp

A Kiss With Teeth by Max Gladstone

Bloodline (The Forgotten Origins Trilogy) by Tara Ellis

City of Brass by Edward D. Hoch

Phil and the Ghost of Camp Ch-Yo-Ca by John Luke Robertson

A Lady Under Siege by Preston, B.G.

Replicant Night by K. W. Jeter

The Sleeping Doll by Jeffery Deaver

All The King's-Men (The Yellow Hoods, #3) by Adam Dreece

The Magic Fart by Piers Anthony