Top Description Inners Fields Constructors Methods
com.sun.org.apache.xerces.internal.impl.xpath.regex

public Class RegularExpression

extends Object
implements Serializable
Class Inheritance
All Implemented Interfaces
java.io.Serializable
Imports
java.text.CharacterIterator, java.util.Locale, .Stack, com.sun.org.apache.xerces.internal.util.IntStack

A regular expression matching engine using Non-deterministic Finite Automaton (NFA). This engine does not conform to the POSIX regular expression.

How to use

A. Standard way
 RegularExpression re = new RegularExpression(regex);
 if (re.matches(text)) { ... }
 
B. Capturing groups
 RegularExpression re = new RegularExpression(regex);
 Match match = new Match();
 if (re.matches(text, match)) {
     ... // You can refer captured texts with methods of the Match class.
 }
 

Case-insensitive matching

 RegularExpression re = new RegularExpression(regex, "i");
 if (re.matches(text) >= 0) { ...}
 

Options

You can specify options to RegularExpression(regex, options) or setPattern(regex, options). This options parameter consists of the following characters.

"i"
This option indicates case-insensitive matching.
"m"
^ and $ consider the EOL characters within the text.
"s"
. matches any one character.
"u"
Redefines \d \D \w \W \s \S \b \B \< \> as becoming to Unicode.
"w"
By this option, \b \B \< \> are processed with the method of 'Unicode Regular Expression Guidelines' Revision 4. When "w" and "u" are specified at the same time, \b \B \< \> are processed for the "w" option.
","
The parser treats a comma in a character class as a range separator. [a,b] matches a or , or b without this option. [a,b] matches a or b with this option.
"X"
By this option, the engine confoms to XML Schema: Regular Expression. The match() method does not do subsring matching but entire string matching.

Syntax

Differences from the Perl 5 regular expression

  • There is 6-digit hexadecimal character representation (\vHHHHHH.)
  • Supports subtraction, union, and intersection operations for character classes.
  • Not supported: \ooo (Octal character representations), \G, \C, \lc, \ uc, \L, \U, \E, \Q, \N{name}, (?{code}), (??{code})

Meta characters are `. * + ? { [ ( ) | \ ^ $'.


BNF for the regular expression

 regex ::= ('(?' options ')')? term ('|' term)*
 term ::= factor+
 factor ::= anchors | atom (('*' | '+' | '?' | minmax ) '?'? )?
            | '(?#' [^)]* ')'
 minmax ::= '{' ([0-9]+ | [0-9]+ ',' | ',' [0-9]+ | [0-9]+ ',' [0-9]+) '}'
 atom ::= char | '.' | char-class | '(' regex ')' | '(?:' regex ')' | '\' [0-9]
          | '\w' | '\W' | '\d' | '\D' | '\s' | '\S' | category-block | '\X'
          | '(?>' regex ')' | '(?' options ':' regex ')'
          | '(?' ('(' [0-9] ')' | '(' anchors ')' | looks) term ('|' term)? ')'
 options ::= [imsw]* ('-' [imsw]+)?
 anchors ::= '^' | '$' | '\A' | '\Z' | '\z' | '\b' | '\B' | '\<' | '\>'
 looks ::= '(?=' regex ')'  | '(?!' regex ')'
           | '(?<=' regex ')' | '(?<!' regex ')'
 char ::= '\\' | '\' [efnrtv] | '\c' [@-_] | code-point | character-1
 category-block ::= '\' [pP] category-symbol-1
                    | ('\p{' | '\P{') (category-symbol | block-name
                                       | other-properties) '}'
 category-symbol-1 ::= 'L' | 'M' | 'N' | 'Z' | 'C' | 'P' | 'S'
 category-symbol ::= category-symbol-1 | 'Lu' | 'Ll' | 'Lt' | 'Lm' | Lo'
                     | 'Mn' | 'Me' | 'Mc' | 'Nd' | 'Nl' | 'No'
                     | 'Zs' | 'Zl' | 'Zp' | 'Cc' | 'Cf' | 'Cn' | 'Co' | 'Cs'
                     | 'Pd' | 'Ps' | 'Pe' | 'Pc' | 'Po'
                     | 'Sm' | 'Sc' | 'Sk' | 'So'
 block-name ::= (See above)
 other-properties ::= 'ALL' | 'ASSIGNED' | 'UNASSIGNED'
 character-1 ::= (any character except meta-characters)

 char-class ::= '[' ranges ']'
                | '(?[' ranges ']' ([-+&] '[' ranges ']')? ')'
 ranges ::= '^'? (range ','?)+
 range ::= '\d' | '\w' | '\s' | '\D' | '\W' | '\S' | category-block
           | range-char | range-char '-' range-char
 range-char ::= '\[' | '\]' | '\\' | '\' [,-efnrtv] | code-point | character-2
 code-point ::= '\x' hex-char hex-char
                | '\x{' hex-char+ '}'
                | '\v' hex-char hex-char hex-char hex-char hex-char hex-char
 hex-char ::= [0-9a-fA-F]
 character-2 ::= (any character except \[]-,)
 

TODO


Author
TAMURA Kent <kent@trl.ibm.co.jp>

Nested and Inner Type Summary

Modifier and TypeClass and Description
pack-priv static class
pack-priv static class
pack-priv static class
pack-priv static class
pack-priv abstract static class
pack-priv static class

Field Summary

Modifier and TypeField and Description
pack-priv static final int
pack-priv transient RegularExpression.Context
pack-priv static final boolean
pack-priv static final int
pack-priv transient RangeToken
pack-priv transient String
pack-priv transient boolean
pack-priv transient int
pack-priv transient BMPattern
pack-priv boolean
pack-priv static final int
pack-priv static final int
pack-priv static final int
pack-priv transient int
pack-priv static final int
pack-priv int
nofparen

The number of parenthesis in the regular expression.

pack-priv transient int
pack-priv transient Op
pack-priv int
pack-priv static final int
pack-priv static final int
pack-priv static final int
pack-priv String
regex

A regular expression.

private static final long
pack-priv static final int
pack-priv static final int
pack-priv Token
tokentree

Internal representation of the regular expression.

pack-priv static final int
pack-priv static final int
USE_UNICODE_CATEGORY

This option redefines \d \D \w \W \s \S.

private static final int
private static final int
private static final int
pack-priv static final int

Constructor Summary

AccessConstructor and Description
public
RegularExpression(String
A regular expression
regex
)

Creates a new RegularExpression instance.

public
RegularExpression(String
A regular expression
regex
,
String
A String consisted of "i" "m" "s" "u" "w" "," "X"
options
)

Creates a new RegularExpression instance with options.

public
RegularExpression(String
A regular expression
regex
,
String
A String consisted of "i" "m" "s" "u" "w" "," "X"
options
,
Locale locale)

Creates a new RegularExpression instance with options.

pack-priv
RegularExpression(String regex, Token tok, int parens, boolean hasBackReferences, int options)

Method Summary

Modifier and TypeMethod and Description
private synchronized void
compile(Token tok)

Compiles a token tree into an operation flow.

private Op
compile(Token tok, Op next, boolean reverse)

Converts a token to an operation.

public boolean
equals(Object
the reference object with which to compare.
obj
)

Overrides java.lang.Object.equals.

Return true if patterns are the same and the options are equivalent.
pack-priv boolean
equals(String pattern, int options)

public int
getNumberOfGroups()

Return the number of regular expression groups.

public String
getOptions()

Returns a option string.

public String
private static final int
getPreviousWordType(RegularExpression.ExpressionTarget target, int begin, int end, int offset, int opts)

private static final int
getWordType(RegularExpression.ExpressionTarget target, int begin, int end, int offset, int opts)

private static final int
getWordType0(char ch, int opts)

public int
hashCode()

Overrides java.lang.Object.hashCode.

Returns a hash code value for this object.
private static final boolean
isEOLChar(int ch)

private static final boolean
isSet(int options, int flag)

private static final boolean
isWordChar(int ch)

private int

Returns:

-1 when not match; offset of the end of matched string when match.
match
(RegularExpression.Context con, Op op, int offset, int dx, int opts)

pack-priv boolean
private boolean
matchChar(int ch, int other, boolean ignoreCase)

public boolean

Returns:

true if the target is matched to this regular expression.
matches
(char[] target)

Checks whether the target text contains this pattern or not.

public boolean

Returns:

true if the target is matched to this regular expression.
matches
(char[] target, int
Start offset of the range.
start
,
int
End offset +1 of the range.
end
)

Checks whether the target text contains this pattern in specified range or not.

public boolean

Returns:

Offset of the start position in target; or -1 if not match.
matches
(char[] target, Match
A Match instance for storing matching result.
match
)

Checks whether the target text contains this pattern or not.

public boolean

Returns:

Offset of the start position in target; or -1 if not match.
matches
(char[] target, int
Start offset of the range.
start
,
int
End offset +1 of the range.
end
,
Match
A Match instance for storing matching result.
match
)

Checks whether the target text contains this pattern in specified range or not.

public boolean

Returns:

true if the target is matched to this regular expression.
matches
(String target)

Checks whether the target text contains this pattern or not.

public boolean

Returns:

true if the target is matched to this regular expression.
matches
(String target, int
Start offset of the range.
start
,
int
End offset +1 of the range.
end
)

Checks whether the target text contains this pattern in specified range or not.

public boolean

Returns:

Offset of the start position in target; or -1 if not match.
matches
(String target, Match
A Match instance for storing matching result.
match
)

Checks whether the target text contains this pattern or not.

public boolean

Returns:

Offset of the start position in target; or -1 if not match.
matches
(String target, int
Start offset of the range.
start
,
int
End offset +1 of the range.
end
,
Match
A Match instance for storing matching result.
match
)

Checks whether the target text contains this pattern in specified range or not.

public boolean

Returns:

true if the target is matched to this regular expression.
matches
(CharacterIterator target)

Checks whether the target text contains this pattern or not.

public boolean

Returns:

Offset of the start position in target; or -1 if not match.
matches
(CharacterIterator target, Match
A Match instance for storing matching result.
match
)

Checks whether the target text contains this pattern or not.

private static final boolean
matchIgnoreCase(int chardata, int ch)

pack-priv void
prepare()

Prepares for matching.

public void
setPattern(String newPattern)

public void
setPattern(String newPattern, Locale locale)

private void
setPattern(String newPattern, int options, Locale locale)

public void
setPattern(String newPattern, String options)

public void
setPattern(String newPattern, String options, Locale locale)

public String
toString()

Overrides java.lang.Object.toString.

Represents this instence in String.
Inherited from java.lang.Object:
clonefinalizegetClassnotifynotifyAllwaitwaitwait