Tornado Class Library

Special Topic - Regular Expression

Regular expression is a sequence of characters that denote a set of strings (also represented as L(R)). When used to constrain a lexical space, a regular expression asserts that only strings in the set of strings are valid literals for values of that type.

A regular expression is composed of zero or more branches that are separated by the "or" (|) character.

Pattern Explanation
[a-c] This matches any character in the range 'a' to 'c'
[a-cde] This matches any character in the range 'a' to 'e'
[\-x] This matches the '-' character or the 'x' character
[^ab] This matches any characters except 'a' or 'b'
. This matches any character except \n. To match any character at all, you might try [\d\D].
{n1,n2} This matches between n1 and n2 instances of the previous Pattern, where n1 and n2 are integers.
{n1,} This matches at least n1 instances of the previous Pattern
* The same as {0,}
? The same as {0,1}
+ The same as {1,}
*? By default, {n1,n2}, {n1,}, *, ?, +, match the largest number of occurences of the preceeding Pattern. If any of these is followed by a ? it will attempt, instead, to match the fewest occurents of the preceeding Pattern.
(a|b) This matches the character a or b, and returns the matched character as a backreference. In other words, if you call Regex's search method with "a" you will find that an "a" is returned by stringMatched(1).
(a) This matches the character "a" as a backreference.
\b This matches a word boundary, either a beginning \w character, an ending \w character, or one of these two sequences: \w\W, \W\w.
(?: ... ) like the parenthesis above, but does not create a backreference.
(?= ... ) A zero-length lookahead. Thus the pattern "foo(?=bar)" will match "foo", but only if followed by "bar".
(?! ... ) Another form of zero-length lookahead. However, it only matches if the thing in the ()'s is not matched. Thus "foo(?!bar)" matches "foo" only if it is not followed by "bar".
(?# ... ) A comment
\B A non-word Boundary. Essentially the same as (?!\b).
\d Essentially the same as [0-9].
\D Not a digit. Essentially the same as [^0-9].
\w A word character, essentially the same as [a-zA-Z_0-9]
\W Not a word character.
\s A white-space character, [ \t\b\n\r].
\S A non white-space character.
\1 Match the contents of the first backreference. Thus "([a-d]).*\1" matches the first 5 characters of "axyzabc".
(?i) Tell this pattern to ignore case during a match.
$ Matches the end of a String (the \n character is considered the end of the string by this pattern element).
\Z Matches the end of a String. The \n character does not count as the end.
^ Matches the beginning of a String. This is either the absolute beginning, or right after a \n character.
\A Matches the absolute beginning of a String.
\G Matches the place we left off in our last search of this String with this pattern or, failing that, the beginning of the String.


 MagicCell Pre-Programmed Masks (Regular Expression)

Standard Mask Regular Expression
EMAIL "\\A\\w\.+\\@(\\w*+\\.?)+\\Z"
PHONE \\A((\\+\\d+[- ])?\\(?\\d\\d\\d\\)?[- ])?\\d\\d\\d[- ]?\\d\\d\\d\\d\\Z
USPHONE \\A(\\(?\\d\\d\\d\\)?[- ])?\\d\\d\\d[- ]?\\d\\d\\d\\d\\Z
SSN \\A\\d\\d\\d-\\d\\d-\\d\\d\\d\\d\\Z
ISNUMBER \\A(-\\d)?\\d*\\.?\\d*( ?[eE]{1}[\\+\\-]?\\d+)?\\Z
ISPOSITIVE \\A\\d*\\.?\\d*\\Z
NOTBLANK \\A.*\\S.*\\Z
ISUPPER \\A[^a-z]*\\Z
ISLOWER \\A[^Aa-Z]*\\Z


Send comments on this topic.
Copyright 1998-2006 ASP-db