Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
If you have processed the document using a schema, things get more interesting. The situation where the typed value is most useful is where the schema defines a simple type for the element or attribute (or in the case of elements, a complex type with simple content—which means that the element can have attributes, but it cannot have child elements). As we will see in Chapter 4, simple types in XML Schema allow atomic values or lists of atomic values, but they don't allow child elements.
For attributes, all types are simple types, so the above rules cover all the possibilities. For elements, however, there are additional rules to cover non-simple types:
The Type Annotation of a Node
As well as having a typed value, a node also has a type annotation. This is a reference to the type definition that was used to validate the node during schema processing. It is not available directly to applications, but it affects the outcome of a number of type-sensitive operations. For example, when you select all attributes of type
xs:date
by writing the path expression
//attribute(*,
xs:date)
(this is described in Chapter 11), the system looks at the type annotations of the attributes to see which nodes qualify.
In the XDM specification, the type annotation is modeled as an
xs:QName
holding the name of the type in the case where the type is a globally declared schema type, or an invented name in the case where it is locally declared (not all types defined in a schema need to be named). It's reasonable to treat this as polite fiction, designed to tie up loose ends in the specification in an area where the practical details will inevitably vary from one implementation to another. Any real schema-aware XPath processor will need to have some kind of access to schema information both at compile time and at runtime, but the W3C specifications have not tried to model exactly what this should look like. In practice, the type annotation on a node is likely to be implemented as some kind of pointer into the metadata representing the cached schema information. But for defining the semantics of constructs like
//attribute(*,
xs:date)
, it's enough to assume that the node contains just the type name.
The type annotation defines the type of the content of the node, not the type of the node itself. This is an important distinction, and we'll have more to say about it when we discuss the XPath type system in Chapter 5.
You might imagine that the type annotation is redundant, because the typed value is itself an atomic value, and the atomic value itself has a label identifying its type. Very often, the type annotation of the node will be the same as the label on its typed value. However, this only works for nodes whose typed value is a single atomic value. In cases where the schema type is a list type, or a union type, the type annotation on the node is the name of the list or union type, which is not the same as the type of the individual atomic values making up the typed value. For example, if the schema type of an attribute is
xs:IDREFS
(which is defined as a list of
xs:IDREF
values), then the type annotation on the attribute node will be
xs:IDREFS
, but the items in the typed value will be labeled
xs:IDREF
. If the typed value is an empty sequence, there will be no items to carry a label, but the containing node can still be annotated as being of type
xs:IDREFS
. Similarly, if the schema type of an element allows attributes as well as integer content, the typed value will be labeled as
xs:integer
, while the element node itself will have a type annotation that refers to the name of a complex type in the schema.
There is, however, a strong relationship between the string value, the typed value, and the type annotation. In fact, with knowledge of the schema and access to a schema validator, the typed value can always be reconstructed from the string value and the type annotation.
If an element or attribute node has not been validated using a schema processor, then the type annotation will be
xs:untypedAtomic
in the case of an attribute node, or
xs:untyped
in the case of an element node.
For document, comment, processing-instruction, and namespace nodes, there is no type annotation (the value of the type annotation is an empty sequence). For text nodes, the type annotation is
xs:untypedAtomic
(but there is nothing in the language that makes use of this fact).
The Base URI of a Node
A node has a base URI. This should not be confused with its namespace URI. The base URI of a node depends on the URI of the source XML document it was loaded from, or more accurately, the URI of the external entity it was loaded from, since different parts of the same document might come from different XML entities. The base URI is used when evaluating a relative URI reference that occurs as part of the value of this node, for example an
href
attribute: this is always interpreted relative to the base URI of the node it came from.
It is possible to override this by specifying an explicit base URI using the
xml:base
attribute. For example, if an element has the attribute
xml:base=“../index.xml”
, then the base URI for this element, and for all its descendants provided they are in the same XML external entity, is the
index.xml
file in the parent directory of the file that would otherwise have provided the base URI.