Friday, October 29, 2010

REGULAR EXPRESSION

  • Regular Expression (Regex) is a special text string/pattern describing a certain amount of text.
  • Regex are case sensitive by default, unless we tell the regex engine to ignore differences in case.

Characters:

  1. Literal Characters
  2. Special Characters
  3. Non-Printable Characters

Literal Characters:

  • Most regular expression consists of a single literal character.

§ Eg: Burning Glass Technologies

Ø If we search for “n” it will match the letter “n” which comes after the letter “r”. i.e. The first occurrence of the character in the string is shown and not matter in which place it comes.

Ø If we need exact place of a word we should tell the regex engine by using word boundries.

Special Characters:

ü 11 Characters.

a. [ - Opening Square brackets

b. \ - Backslash

c. ^ - Caret

d. $ - Dollar Sign

e. . - Period/Dot

f. | - Vertical bar / Pipe Symbol

g. ? - Question mark

h. * - Asterisk / Star

i. + - Plus Sign

j. ( - Opening Round brackets

k. ) - Closing Round Brackets

ü Often called as “Meta Characters”.

ü If we want to use any of these characters as literal in a regex we need to escape them with a “\”.

Non-Printable Characters:

· The Non-Printable characters are,

i. “\t” Matches a tab character. (0x09)

ii. “\r” - For Carriage Return. (0x0D)

iii. “\n” - Line Feed. (0x0A)

iv. “\a” - Bell. (0x07)

v. “\e” - Escape. (0x1B)

vi. “\v” - Vertical Tab. (0x0B)

vii. “\f” - Form Feed. (0x0C)

· Can include any character in our regex if we know its hexadecimal ASCII or ANSI code for the character set.

· Leading Zero is required.

No comments:

Post a Comment