XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (628 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
7.89Mb size Format: txt, pdf, ePub

Signature

Argument
Type
Meaning
input
xs:string?
The input string. If an empty sequence or zero-length string is supplied, the function returns an empty sequence.
regex
xs:string
The regular expression used to match separators, written according to the rules given in Chapter 14.
flags
(optional)
xs:string
One or more letters indicating options on how the matching is to be performed. If this argument is omitted, the effect is the same as supplying a zero-length string, which defaults all the option settings.
Result
xs:string*
A sequence whose items are substrings of the input string
.

Effect

The rules for the syntax of regular expressions and the
flags
argument are given in Chapter 14.

The input string is processed from left to right, looking for substrings that match the regular expression supplied in the
regex
argument. A consecutive sequence of characters that doesn't participate in a match is copied as a string to form one item in the output sequence. A sequence of characters that does match the regex is deemed to be a separator and is discarded. The search then resumes at the character position following the matched substring.

It can happen that two substrings starting at the same position both match the regex. There are two ways this situation can arise.

Firstly, it happens when part of the regex is looking for repeated occurrences of a substring. For example, suppose the regex is
\n+
, indicating that any sequence of one or more consecutive newlines acts as a separator. Then clearly, if two adjacent newline characters are found, the regex could match on the first one alone, or on the pair. The rule here is that
+
is a greedy quantifier: it matches as long a substring as it can, in this case, both newline characters. In this example, this is what you want to happen. But if you were trying to remove comments in square brackets by using a regex such as
\[.*\]
, this would have the wrong effect—given the input
Doolittle
[1]
and
Dalley
[2]
, the first separator identified would be
[1]
and
Dalley
[2]
. If you want to match the shortest possible substring, add a
?
after the quantifier to make it non-greedy, thus:
\[.*?\]

Other books

Spectra's Gambit by Vincent Trigili
Iron House by Hart, John
The Village Green Affair by Shaw, Rebecca
David Copperfield by Charles Dickens
Second Chances by Lincoln Cole