XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (321 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

12.13Mb size Format: txt, pdf, ePub

Collations

The static context for XPath expressions includes a set of collations, one of which is marked as the default collation. A collation is essentially a set of rules for comparing and sorting strings. One collation might decide that
pass
and
Paβ
are equal, another that they are distinct.

As far as XPath is concerned, collations are defined outside the system, and a collation is treated as a black box. The XPath processor knows which collations exist (because they are listed in the static context), but it doesn't know anything about their characteristics, beyond the fact that it can use the collation to compare two strings.

Collations are identified by URIs. These are like namespace URIs, in that they don't necessarily identify real resources on the Web: they are just globally unique names, ensuring that collations defined by one vendor can't be confused with those defined by a different vendor. There is only one collation whose name has been standardized, namely:

http://www.w3.org/2005/xpath-functions/collation/codepoint

This collation, called the Unicode Codepoint Collation, compares strings character by character, using the numeric values assigned to each character in the Unicode standard. So, for example,
“Z”
<
“a”
is true when using this collation, because the numeric code for
Z
is 90, and the code for
a
is 97.

As with other aspects of the static context, it's up to the host language to say what collations are available and how they are defined. In this area, however, XSLT as a host language has nothing to say: it leaves it entirely up to the implementation. Many implementations are likely to devise a scheme whereby URIs identify collations provided by the programming language environment, by a database system, or by the operating system.

In Java, for example, you can define a collator by creating an object of class
java.text.Collator
. You can obtain a collator for a particular Locale, which will give you the basic rules for a language (for example,
ä
collates after
z
in Swedish, but not in German). You can then parameterize the collator: for example you can set its strength, which determines whether or not it ignores accents and case, and you can control whether it applies Unicode normalization to the characters before comparison: this process recognizes that there are alternative ways of coding the same character in Unicode, either as combined characters (one codepoint representing lower-case-c-with-cedilla) or as separate characters (separate codepoints for the
c
and the cedilla). Saxon allows you to specify a collation URI that specifies these parameters explicitly, for example the URI:

Other books

Blue Bamboo: Tales by Dazai Osamu by Dazai, Osamu

To Win Her Trust by Mackenzie Crowne

The Triceratops Pops Mystery by David A. Adler

Teach Me by Kar, Alla

Mockingbird Wish Me Luck by Bukowski, Charles

Paris Requiem by Lisa Appignanesi

Drowning by Jassy Mackenzie

Edible by Ella Frank

Get Her Off the Pitch! by Lynne Truss

One Tuesday Morning by Karen Kingsbury