Ferhat KUL: Regular Expressions Reference Guide

Regular expressions really ease to much string operations and validations instead of applying legacy methodologies. Being comfortable in regular expressions depends just practicing :) whenever has a chance to apply. Below is a list of regular expresssion constructs referenced from tutorial on oracle. There are also some examples with really good explanations at mkyong.

Just for quality coding...

Character Classes

Construct	Description
`[abc]`	a, b, or c (simple class)
`[^abc]`	Any character except a, b, or c (negation)
`[a-zA-Z]`	a through z, or A through Z, inclusive (range)
`[a-d[m-p]]`	a through d, or m through p: [a-dm-p] (union)
`[a-z&&[def]]`	d, e, or f (intersection)
`[a-z&&[^bc]]`	a through z, except for b and c: [ad-z] (subtraction)
`[a-z&&[^m-p]]`	a through z, and not m through p: [a-lq-z] (subtraction)

Negation

To match all characters except those listed, insert the "^" metacharacter at the beginning of the character class. This technique is known as negation.

Ranges

To specify a range, simply insert the "-" metacharacter between the first and last character to be matched, such as [1-5] or [a-h]

Unions

You can also use unions to create a single character class comprised of two or more separate character classes. To create a union, simply nest one class inside the other, such as [0-4[6-8]]. This particular union creates a single character class that matches the numbers 0, 1, 2, 3, 4, 6, 7, and 8.

Intersections

To create a single character class matching only the characters common to all of its nested classes, use &&, as in [0-9&&[345]]. This particular intersection creates a single character class matching only the numbers common to both character classes: 3, 4, and 5.

Subtraction

Finally, you can use subtraction to negate one or more nested character classes, such as [0-9&&[^345]]. This example creates a single character class that matches everything from 0 to 9, except the numbers 3, 4, and 5.

Predefined Character Classes

Construct	Description
`.`	Any character (may or may not match line terminators)
`\d`	A digit: `[0-9]`
`\D`	A non-digit: `[^0-9]`
`\s`	A whitespace character: `[ \t\n\x0B\f\r]`
`\S`	A non-whitespace character: `[^\s]`
`\w`	A word character: `[a-zA-Z_0-9]`
`\W`	A non-word character: `[^\w]`

Quantifiers

Quantifiers allow you to specify the number of occurrences to match against.

Greedy	Reluctant	Possessive	Meaning
`X?`	`X??`	`X?+`	`X`, once or not at all
`X*`	`X*?`	`X*+`	`X`, zero or more times
`X+`	`X+?`	`X++`	`X`, one or more times
`X{n}`	`X{n}?`	`X{n}+`	`X`, exactly `n` times
`X{n,}`	`X{n,}?`	`X{n,}+`	`X`, at least `n` times
`X{n,m}`	`X{n,m}?`	`X{n,m}+`	`X`, at least `n` but not more than `m` times

Boundary Matchers

Boundary Construct	Description
`^`	The beginning of a line
`$`	The end of a line
`\b`	A word boundary
`\B`	A non-word boundary
`\A`	The beginning of the input
`\G`	The end of the previous match
`\Z`	The end of the input but for the final terminator, if any
`\z`	The end of the input

Usage of Pattern and Matcher :

java.util.regex.Pattern pattern = Pattern.compile(regex);
java.util.regex.Matcher matcher = pattern.matcher(searchString);

while (matcher.find()) {
                System.out.println(String.format("I found the text"
                        + " \"%s\" starting at "
                        + "index %d and ending at index %d.%n",
                        matcher.group(), matcher.start(), matcher.end());
}

Ferhat KUL

Asdfasf

Friday, December 28, 2012

Regular Expressions Reference Guide