XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (629 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

11.45Mb size Format: txt, pdf, ePub

Another situation that can cause two different substrings to match at the same position is where the regex contains two alternatives that both match. For example, when the regex
#|##
is applied to a string that contains two consecutive
#
characters, both branches will match. The rule here is that the first (leftmost) alternative wins. In this case, this is almost certainly not what was intended: rewrite the expression as
##|#
, or as
##?
.

If the input string starts with a separator, then the output sequence will start with a zero-length string representing what was found before the first separator. If the input string ends with a separator, there will similarly be a zero-length string at the end of the sequence. If there are two adjacent separators in the middle of the string, you will get a zero-length string in the middle of the result sequence. In all cases the number of items in the result sequence is the number of separators in the input string plus one.

If the regex does not match the input string, the
tokenize()
function will return the input string unchanged, as a singleton sequence. If this is not the effect you are looking for, use the
matches()
function first to see if there is a match.

If the regex is one that matches a zero-length string, that is, if
matches(“”
,
$regex)
is true, the system reports an error. An example of such a regex is
\s*
. Although various interpretations of such a construct are possible, the Working Group decided that the results were too confusing and decided not to allow it.

Examples

Expression	Result
tokenize(“Go home, Jack!”, “\W+”)	(“Go”, “home”, “Jack”, “”)
tokenize(“abc[NL]def[XY]”, “\[.*?\]”)	(“abc”, “def”, “”)

Usage

A limitation of this function is that it is not possible to do anything with the separator substrings. This means, for example, that you can't treat a number differently depending on whether it was separated from the next number by a comma or a semicolon. One solution to this problem is to process the string in two passes: first, do a
replace()
call in which the separators
,
and
;
are replaced by (say)
,#
and
;#
; then use
tokenize()
to split the string at the
#
characters, and the original
,
or
;

Other books

Renni the Rescuer by Felix Salten

The Touch by Jaymie Holland

Surviving Scotland by Kristin Vayden

God is an Astronaut by Alyson Foster

The Perfect Gangbang by Alastair Anders

The Ghost of a Model T and Other Stories by Clifford D. Simak

Silas: A Supernatural Thriller by Robert J. Duperre

A Christmas Prayer: An Autistic Child, a Father's Love, a Woman's Heart (Christmas Romance) by Rondeau, Linda Wood

Magestorm: The Reckoning by Chris Fornwalt

Pemberley to Waterloo: Georgiana Darcy's Diary, Volume 2 by Elliott, Anna