external: back -|- index -|- samples
regex references examples end
2010-04-12: Moved 'Regular Expressions' (regex) to its own page.
I have only listed some things here, as this is not meant to be very comprehensive. There are LOTS of online references - this is a simple 'cheat sheet' from http://regexlib.com/ is an example.
char(s) | description | example | finds |
---|---|---|---|
^ |
start of target; in multi-line, after each new-line | ^abc | abc, abcdefg, abc123, but not babc |
$ |
end of string; in multi-line, before each new-line | abc$ | abc, endabc, 123abc, but not abcd |
. |
any character, but not new-line, unless multi-line | a.c | abc, aac, acc, etc |
| |
alternatives | bill|ted | bill or ted |
{...} |
explicit quantifier (count) notation | ab{2}c | abbc |
[...] |
explicit class, set of characters to match | a[bB]c | abc and aBc |
(...) |
logical grouping of part of an expression | (abc){2} | abcabc |
* |
0 or more of previous expression | ab*c | ac, abc, abbc, abbbc |
+ |
1 or more of previous expression | ab+c | abc, abbc, abbbc |
? |
0 or 1 of previous; also minimal matching | ab?c | ac, abc |
\ |
Preceding one of above makes it literal | a\*b | a*b |
Thus 'ordinary characters' are anything other than ^ $ . | { } [ ] ( ) * + ? \
The backslash, '\' not only converts the above meta characters to their literal meaning, but this 'escape' character, followed by one of the following also has a special meaning, or character class ...
char | description | char | description |
---|---|---|---|
\w |
alphanumeric, including _ |
\W |
non-alphanumeric |
\s |
white space |
\S |
non-white space |
\d |
numeric (digit) |
\D |
non-numeric |
\A |
beginning of the string |
\Z |
end of string |
\b |
word boundaries |
\B |
non-boundaries |
\n, \r, \f, \t etc, have their usual meaning, namely CR (0x0d), LF (0x0a), FF (0x0c), and TAB (0x09). Others, depending on the implementation, are \a (bell/alarm 0x07), \b (backspace 0x08), \v (vertical tab 0x0b), \e (escape 0x1b), \040 (ASCII character as OCTAL), \x20 (ASCII character using hexadecimal notation - 2 digits), \cC (control-C), and \u0020 (Unicode character using hexadecimal notation) ... but as stated, this varies with implementation ... see - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcharacterescapes.asp - for more ...
class | description |
---|---|
[aeiou] |
matches any single character included in the specified set of characters |
[^aeiou] |
matches any single character not in the specified set of characters |
[0-9a-fA-F] |
Use of a hyphen (-) allows specification of a contiguous character range |
\p{name} |
Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing. |
\P{name} |
Matches text not included in groups and block ranges specified in {name}. |
[a-zA-Z_0-9] |
is equivalent to \w shown above |
See - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcharacterclasses.asp for more. |
back -|- top -|- index -|- samples
Other Regular Expression (regex) references found - some sites found using 'regular expression' search in Yahoo!
back -|- top -|- index -|- samples
Some examples of regex found -
from : http://www.regular-expressions.info/
email address: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
and : http://www.regular-expressions.info/email.html
or : ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$
or : ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)$
and the massive: RFC 2822 : (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
practical implementation of RFC 2822 : [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
allow any two-letter country code top level domain : [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b
and : http://www.regular-expressions.info/examples.html
<TAG\b[^>]*>(.*?)</TAG>
match the opening and closing pair of any HTML tag : <([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
IP Address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
But to avoid say : 999.999.999.999 : \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
from : http://www.blazonry.com/perl/regexp_exs.php
Y2K Date: (4 digit year) : ($line1 =~ m/[0-9]{2}[\/|-][0-9]{2}[\/|-][0-9]{4}/)
from : http://www.wilsonmar.com/1regex.htm
Uniform Resource Identifier (URI) breakdown :
my $uri = "http://www.ics.uci.edu/pub/ietf/uri/#Related";
print "$1, $2, $3, $4, $5, $6, $7, $8, $9" if $uri =~
m{^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?};
$1 = http: $2 = http (the scheme) $3 = //www.ics.uci.edu $4 = www.ics.uci.edu (the authority) $5 = /pub/ietf/uri/ (the path) $6 = $7 = (the query) $8 = #Related $9 = Related (the fragment)
back -|- top -|- index -|- samples