Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
While the
regex-group()
function. This takes an integer argument, which is the number of the captured group that is required. If there is no corresponding subexpression in the regular expression, or if that subexpression didn't match anything, the result is a zero-length string.
Usage and Examples
Many tasks that require regex processing can be accomplished using the three functions in the core function library (see Chapter 13) that use regular expressions:
matches()
,
replace()
, and
tokenize()
. These are used as follows:
Function | Purpose |
matches() | Tests whether a string matches a given regular expression |
replace() | Replaces the parts of a string that match a given regular expression with a different string |
tokenize() | Splits a string into a sequence of substrings, by finding occurrences of a separator that matches a given regular expression |
There are many ways to use these functions in an XSLT stylesheet. For example, you might write a template rule that matches customers with a customer number in the form 999-AAAA-99 (this might be the only way, for example, that you can recognize customers acquired as a result of a corporate takeover). Write this as:
match=“customer[matches(cust-nr, ‘
∧
[0-9]{3}-[A-Z]{4}-[0-9]{2}$’)]”>
There is no need to double the curly braces in this example. The
match
attribute of
The
There are two main ways of using
A Single-Match Example
In the single-match use of
For example, suppose you want to display a date as 13
th
March 2008. To achieve this, you need to generate the output
13thMarch 2008
(or rather, text nodes and element nodes corresponding to this serial XML representation). You can achieve the basic date formatting using the
format-date()
function described in Chapter 13, but to add the markup you need to post-process the output of this function.
Here is the code (for the full stylesheet see
single-match.xsl
in the download archive):
select=“format-date(current-date(), ‘[D1o]#[MNn]#[Y]’)”
regex=“
∧
([0-9]+)([a-z]+)#([A-Z][a-z]+)#(.*)$”>
Note that the regex is anchored (it starts with
∧
and ends with
$
) to force it to match the whole input string. Unlike regex expressions used in the pattern facet in XML Schema, a regex used in the
In this example I chose in the
format-date()
, without any markup. This error might occur, for example, because the stylesheet is being run in a locale that uses an unexpected representation of ordinal numbers. The alternative would be to call
A Multiple-Match Example
In a multiple-match application, you supply a regular expression that will match the input string repeatedly, breaking it into a sequence of substrings. There are two main ways you can design this:
1.
Match the parts of the string that you are interested in. For example, the regex
[0-9]+
will match any sequence of consecutive digits, and pass it to the
There is a variant of this approach that is useful where there are no separators as such. For example, you might be dealing with a format such as the one used for ISO 8601 durations, which look like this:
P12H30 M10 S
, with the requirement to split out the components
12H
,
30 M
, and
10 S
. The regex
[0-9]+[A-Z]
will achieve this, passing each component to the
2.
Match the separators between the parts of the string that you are interested in. For example, if the string uses a comma as a separator, the regex
,\s*
will match any comma followed optionally by spaces. The fields that appear between the commas will be passed, one at a time, to the