Regular expressions really ease to much string operations and validations instead of applying legacy methodologies. Being comfortable in regular expressions depends just practicing :) whenever has a chance to apply. Below is a list of regular expresssion constructs referenced from
tutorial on oracle. There are also some examples with really good explanations at
mkyong.
Just for quality coding...
Character Classes
Construct |
Description |
[abc] |
a, b, or c (simple class) |
[^abc] |
Any character except a, b, or c (negation) |
[a-zA-Z] |
a through z, or A through Z, inclusive (range) |
[a-d[m-p]] |
a through d, or m through p: [a-dm-p] (union) |
[a-z&&[def]] |
d, e, or f (intersection) |
[a-z&&[^bc]] |
a through z, except for b and c: [ad-z] (subtraction) |
[a-z&&[^m-p]] |
a through z, and not m through p: [a-lq-z] (subtraction) |
Negation
To match all characters except those listed, insert the "^ " metacharacter at the beginning of the character class. This technique is known as negation.
Ranges
To specify a range, simply insert the "- " metacharacter between the first and last character to be matched, such as [1-5] or [a-h]
Unions
You can also use unions to create a single character class
comprised of two or more separate character classes. To create a union,
simply nest one class inside the other, such as [0-4[6-8]] . This particular union creates a single character class that matches the numbers 0, 1, 2, 3, 4, 6, 7, and 8.
Intersections
To create a single character class matching only the characters common to all of its nested classes, use && , as in [0-9&&[345]] .
This particular intersection creates a single character class matching
only the numbers common to both character classes: 3, 4, and 5.
Subtraction
Finally, you can use subtraction to negate one or more nested character classes, such as [0-9&&[^345]] . This example creates a single character class that matches everything from 0 to 9, except the numbers 3, 4, and 5.
Predefined Character Classes
Construct |
Description |
. |
Any character (may or may not match line terminators) |
\d |
A digit: [0-9] |
\D |
A non-digit: [^0-9] |
\s |
A whitespace character: [ \t\n\x0B\f\r] |
\S |
A non-whitespace character: [^\s] |
\w |
A word character: [a-zA-Z_0-9] |
\W |
A non-word character: [^\w] |
Quantifiers
Quantifiers allow you to specify the number of occurrences to match against.
Greedy |
Reluctant |
Possessive |
Meaning |
X? |
X?? |
X?+ |
X , once or not at all |
X* |
X*? |
X*+ |
X , zero or more times |
X+ |
X+? |
X++ |
X , one or more times |
X{n} |
X{n}? |
X{n}+ |
X , exactly n times |
X{n,} |
X{n,}? |
X{n,}+ |
X , at least n times |
X{n,m} |
X{n,m}? |
X{n,m}+ |
X , at least n but not more than m times |
|
Boundary Matchers
Boundary Construct |
Description |
^ |
The beginning of a line |
$ |
The end of a line |
\b |
A word boundary |
\B |
A non-word boundary |
\A |
The beginning of the input |
\G |
The end of the previous match |
\Z |
The end of the input but for the final terminator, if any |
\z |
The end of the input |
Usage of Pattern and Matcher :
java.util.regex.Pattern pattern = Pattern.compile(regex);
java.util.regex.Matcher matcher = pattern.matcher(searchString);
while (matcher.find()) {
System.out.println(String.format("I found the text"
+ " \"%s\" starting at "
+ "index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
}