XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (629 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
3.56Mb size Format: txt, pdf, ePub
ads

.

Another situation that can cause two different substrings to match at the same position is where the regex contains two alternatives that both match. For example, when the regex
#|##
is applied to a string that contains two consecutive
#
characters, both branches will match. The rule here is that the first (leftmost) alternative wins. In this case, this is almost certainly not what was intended: rewrite the expression as
##|#
, or as
##?
.

If the input string starts with a separator, then the output sequence will start with a zero-length string representing what was found before the first separator. If the input string ends with a separator, there will similarly be a zero-length string at the end of the sequence. If there are two adjacent separators in the middle of the string, you will get a zero-length string in the middle of the result sequence. In all cases the number of items in the result sequence is the number of separators in the input string plus one.

If the regex does not match the input string, the
tokenize()
function will return the input string unchanged, as a singleton sequence. If this is not the effect you are looking for, use the
matches()
function first to see if there is a match.

If the regex is one that matches a zero-length string, that is, if
matches(“”
,
$regex)
is true, the system reports an error. An example of such a regex is
\s*
. Although various interpretations of such a construct are possible, the Working Group decided that the results were too confusing and decided not to allow it.

Examples

Expression
Result
tokenize(“Go home, Jack!”, “\W+”)
(“Go”, “home”, “Jack”, “”)
tokenize(“abc[NL]def[XY]”, “\[.*?\]”)
(“abc”, “def”, “”)

Usage

A limitation of this function is that it is not possible to do anything with the separator substrings. This means, for example, that you can't treat a number differently depending on whether it was separated from the next number by a comma or a semicolon. One solution to this problem is to process the string in two passes: first, do a
replace()
call in which the separators
,
and
;
are replaced by (say)
,#
and
;#
; then use
tokenize()
to split the string at the
#
characters, and the original
,
or
;

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
3.56Mb size Format: txt, pdf, ePub
ads

Other books

Premio UPC 1995 - Novela Corta de Ciencia Ficción by Javier Negrete César Mallorquí
Indigo Road by RJ Jones
Disco for the Departed by Colin Cotterill
War-N-Wit, Inc. - The Witch by Roughton, Gail
Bound Angel Bound Demon by Claire Spoors
Rough Justice by Lyle Brandt
An Unusual Bequest by Mary Nichols