BGT-EDITORIAL-BLOG-DPI: REGULAR EXPRESSION

Tuesday, November 2, 2010

REGULAR EXPRESSION – Cont.

POSIX Character Class:

Posix defines a set of character classes that denotes certain common ranges.

* [:digit:] - Only the digits 0 to 9.

* [:alnum:] - Any alphanumeric character 0 to 9 / A to Z / a to z.

* [:alpha:] - Any alpha character A to Z / a to z.

* [:space:] - Space and tab Characters only.

* [:xdigit:] - Hexadecimal notation. 0-9, A-F, a-f.

* [:punct:] - Punctuation Symbols. (. , “ ‘ ? ! ; : # $ % & ( ) * + - / < > = @ [ ] \ ^ _ { } | ~)

* [:print:] - Any printable Character.

* [:graph:] - Exclude whitespace. Many system abbreviate as \w.

* [:upper:] - Any alpha character. A to Z.

* [:lower:] - Any alpha character. a to z.

* [:cntrl:] - Control Characters.

Character Class:

# \d - Any Character in range 0-9. ([:digit:])

# \D - Any Character outside range 0-9. (^[:digit:])

# \s - Whitespace Characters except VT (Vertical Tab). ([:space:])

# \S - Not Whitespace. (^[:space:])

# \w - Any Character in range 0-9, A to Z, a to z. ([:alnum:])

# \W - Any Character not in range. (^[:alnum:])

Positional Abbreviations:

ü \b - Word Boundary. Match any Character at the beginning and/or end of the word. Ie. “\bxx” or “xx\b”. Eg: \bton\b finds in word ton but not tons. \b will find tons.

ü \B - Not word boundary. Match any Character not at the beginning and/or end. Eg: \Bton\B will find wantons but not tons. Ton\B will find both wantons and tons.

BGT-EDITORIAL-BLOG-DPI

Tuesday, November 2, 2010

REGULAR EXPRESSION – Cont.

No comments:

Post a Comment