Perl Regular Expressions

I have only listed some things here, as this is not meant to be very comprehensive. There are LOTS of online references - this is a simple 'cheat sheet' from http://regexlib.com/ is an example.

**Meta-characters - These characters have special meanings**
char(s)	description	example	finds
^	start of target; in multi-line, after each new-line	^abc	abc, abcdefg, abc123, but not babc
$	end of string; in multi-line, before each new-line	abc$	abc, endabc, 123abc, but not abcd
.	any character, but not new-line, unless multi-line	a.c	abc, aac, acc, etc
\|	alternatives	bill\|ted	bill or ted
{...}	explicit quantifier (count) notation	ab{2}c	abbc
[...]	explicit class, set of characters to match	a[bB]c	abc and aBc
(...)	logical grouping of part of an expression	(abc){2}	abcabc
*	0 or more of previous expression	ab*c	ac, abc, abbc, abbbc
+	1 or more of previous expression	ab+c	abc, abbc, abbbc
?	0 or 1 of previous; also minimal matching	ab?c	ac, abc
\	Preceding one of above makes it literal	a\*b	a*b

Thus 'ordinary characters' are anything other than ^ $ . | { } [ ] ( ) * + ? \

The backslash, '\' not only converts the above meta characters to their literal meaning, but this 'escape' character, followed by one of the following also has a special meaning, or character class ...

**Some Character Classes**
char	description	char	description
\w	alphanumeric, including _	\W	non-alphanumeric
\s	white space	\S	non-white space
\d	numeric (digit)	\D	non-numeric
\A	beginning of the string	\Z	end of string
\b	word boundaries	\B	non-boundaries

\n, \r, \f, \t etc, have their usual meaning, namely CR (0x0d), LF (0x0a), FF (0x0c), and TAB (0x09). Others, depending on the implementation, are \a (bell/alarm 0x07), \b (backspace 0x08), \v (vertical tab 0x0b), \e (escape 0x1b), \040 (ASCII character as OCTAL), \x20 (ASCII character using hexadecimal notation - 2 digits), \cC (control-C), and \u0020 (Unicode character using hexadecimal notation) ... but as stated, this varies with implementation ... see - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcharacterescapes.asp - for more ...

more on character classes
class	description
[aeiou]	matches any single character included in the specified set of characters
[^aeiou]	matches any single character not in the specified set of characters
[0-9a-fA-F]	Use of a hyphen (-) allows specification of a contiguous character range
\p{name}	Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
\P{name}	Matches text not included in groups and block ranges specified in {name}.
[a-zA-Z_0-9]	is equivalent to \w shown above
See - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconcharacterclasses.asp for more.

Other References

Other Regular Expression (regex) references found - some sites found using 'regular expression' search in Yahoo!

Examples

Some examples of regex found -

from : http://www.regular-expressions.info/
email address: \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b
and : http://www.regular-expressions.info/email.html
or : ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$
or : ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)$
and the massive: RFC 2822 : (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
practical implementation of RFC 2822 : [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
allow any two-letter country code top level domain : [a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b
and : http://www.regular-expressions.info/examples.html
<TAG\b[^>]*>(.*?)</TAG>
match the opening and closing pair of any HTML tag : <([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
IP Address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
But to avoid say : 999.999.999.999 : \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

from : http://www.blazonry.com/perl/regexp_exs.php
Y2K Date: (4 digit year) : ($line1 =~ m/[0-9]{2}[\/|-][0-9]{2}[\/|-][0-9]{4}/)

from : http://www.wilsonmar.com/1regex.htm
Uniform Resource Identifier (URI) breakdown :
my $uri = "http://www.ics.uci.edu/pub/ietf/uri/#Related"; print "$1, $2, $3, $4, $5, $6, $7, $8, $9" if $uri =~ m{^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?};
$1 = http: $2 = http (the scheme) $3 = //www.ics.uci.edu $4 = www.ics.uci.edu (the authority) $5 = /pub/ietf/uri/ (the path) $6 = $7 = (the query) $8 = #Related $9 = Related (the fragment)

Perl Regular Expressions

regular expressions - mainly pcre (Perl Compatible Regular Expressions)

Other References

Examples