Asdfasf

Friday, December 28, 2012

Regular Expressions Reference Guide

Regular expressions really ease to much string operations and validations instead of applying legacy methodologies. Being comfortable in regular expressions depends just practicing :) whenever has a chance to apply. Below is a list of regular expresssion constructs referenced from tutorial on oracle. There are also some examples with really good explanations at mkyong.

 Just for quality coding...

 Character Classes

Construct Description
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z, or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction)

Negation

To match all characters except those listed, insert the "^" metacharacter at the beginning of the character class. This technique is known as negation.

Ranges

To specify a range, simply insert the "-" metacharacter between the first and last character to be matched, such as [1-5] or [a-h]

Unions

You can also use unions to create a single character class comprised of two or more separate character classes. To create a union, simply nest one class inside the other, such as [0-4[6-8]]. This particular union creates a single character class that matches the numbers 0, 1, 2, 3, 4, 6, 7, and 8.

Intersections

To create a single character class matching only the characters common to all of its nested classes, use &&, as in [0-9&&[345]]. This particular intersection creates a single character class matching only the numbers common to both character classes: 3, 4, and 5.

Subtraction

Finally, you can use subtraction to negate one or more nested character classes, such as [0-9&&[^345]]. This example creates a single character class that matches everything from 0 to 9, except the numbers 3, 4, and 5.

Predefined Character Classes

Construct Description
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

 

Quantifiers

Quantifiers allow you to specify the number of occurrences to match against.

Greedy Reluctant Possessive Meaning
X? X?? X?+ X, once or not at all
X* X*? X*+ X, zero or more times
X+ X+? X++ X, one or more times
X{n} X{n}? X{n}+ X, exactly n times
X{n,} X{n,}? X{n,}+ X, at least n times
X{n,m} X{n,m}? X{n,m}+ X, at least n but not more than m times

Boundary Matchers

Boundary Construct Description
^ The beginning of a line
$ The end of a line
\b A word boundary
\B A non-word boundary
\A The beginning of the input
\G The end of the previous match
\Z The end of the input but for the final terminator, if any
\z The end of the input

Usage of Pattern and Matcher :

java.util.regex.Pattern pattern = Pattern.compile(regex);
java.util.regex.Matcher matcher = pattern.matcher(searchString);

while (matcher.find()) {
                System.out.println(String.format("I found the text"
                        + " \"%s\" starting at "
                        + "index %d and ending at index %d.%n",
                        matcher.group(), matcher.start(), matcher.end());
}

No comments: