Pattern Matching - The RegExp Object
by Thomas Valentine
March 26, 2009
|
The "RegExp" object, short for regular expression allows users to perform pattern matches in strings. Let's examine how this powerful object can perform complex tasks.
|
The RegExp object is a Core JavaScript Object. RegExp stands
for Regular Expression. The RegExp object was based on the
PERL implementation of Regular Expressions. PERL is a very
capable and powerful scripting language. To put it simply,
the RegExp object is used to find a match to the text you
want to find. Various "switches" are used to give you
options on how to find the text. On your browser, if you
select the "find" option from the menu, you type in a string
of text you would like to find, then click OK. The browser
uses a Regular Expression to find a match to your text
within the web page (or whatever application you are working
in - the Regular Expression is very widely used, and in many
languages).
With the RegExp object, you may not only find a match to
your desired text, but also verify user input for things
like valid postal and zip codes, telephone numbers, and
account numbers. The RegExp object works by first creating a
new instance of the Core RegExp object, then assigning a
pattern for the object to match, as in the following syntax
example.
var name = new RegExp("String of Text");
The above syntax example creates a new instance of the
RegExp object called name which looks for the string String
of Text. While this is the most common method of creating a
new RegExp object, there is another more shorthanded form,
as follows.
var name = /String of Text/;
Placing the String of Text between two forward slashes tells
JavaScript that the text, the "pattern", must be applied to
the RegExp object. This method is called "Direct
Assignment".
Defining Your Search Patterns
As mentioned earlier, there is an extensive set of
"switches" used to further refine your search. Used properly
and with some creativity, it is almost guaranteed that
you'll find a specific match to your search string. The
pattern matching characters available to JavaScript are
given in the list below.
- \w - Find a match to any alphanumeric character within a word
- \W - Find a match to any non-word character
- \s - Find a match to any whitespace character such as a tab character, newline, carriage return (enter), form feed, or vertical tab.
- \d - Find a match to any numeric digit
- \D - Find any character that is not a number
- [ \b ] - Find a match for a backspace.
- . (period) - Find a match for any character except a newline character.
- [ ... ] - Match any one character within the square brackets.
- [ ^... ] - Match any character not within the square brackets.
- [ x-y ] - Match any character between X and Y.
- [ ^x-y ] - Match any character not between X and Y
- { x, y } - Match the previous search string at least X times, not to exceed Y times.
- { x, } - Match the previous search string at least X times.
- { x } - Match the previous search string exactly X times.
- ? - Match the previous search string once or not at all.
- + - Match the previous search string at least once.
- * - Match the previous search string any number of times, or not at all
- | - Match the expression to the left or the right of the | character.
- ( ... ) - Group everything inside of the parentheses into one sub-pattern.
- \x - Match the same pattern within the last sub-pattern in group X.
- ^ - Match the beginning of a string or the beginning of a line, in multi-line search string matches.
- $ - Match the end of the search string or the end of a line, in multi-line search string matches.
- \b - Match the position between a word character and a non-word character.
- \B - Match the position that is not between a word character and a non-word character.
Using the above list of options, it is almost a certainty
that you'll find the string you are looking for. But this is
not all that JavaScript offers. There is also a list of what
are called Literal Characters JavaScript uses to make the
search for your string easier, less complicated. That list
is as follows.
- \f - Find a form feed character.
- \n - Find a new line character.
- \r - Find a carriage return character.
- \t - Find a tab character.
- \v - Find a vertical tab character.
- \/ - Find a forward slash.
- \\ - Find a backward slash.
- \. - Find a period.
- \* - Find an asterisk.
- \+ - Find a plus character.
- \? - Find a question mark.
- \| - Find a horizontal bar.
- \( - Find a left parentheses character.
- \) - Find a right parentheses character.
- \[ - Find a left square bracket character.
- \] - Find a right square bracket character.
- \{ - Find a left curly brace character.
- \} - Find a right curly brace character.
- \XXX - Find an ASCII character represented by the octal number \XXX.
- \xHH - Find an ASCII character represented by the hexadecimal number \xHH.
- \cX - Find a control character represented by \cX.
You can see that most of the conceivable options have been
defined for you. The creators of the JavaScript
specifications have been very thorough. But there are two
more "switches" that are probably the most commonly used of
them all. Those two Pattern Attributes are as follows.
- \g - This is for a Global Match, and is used to find all possible matches to your search string.
- \i - This is for a case-insensitive match.
JavaScript Functions
The JavaScript Chronicles
Workhorses - The Pattern Matching Methods
JavaScript Introduction
Part 2: Data Types
Part 3: Arrays
Part 4: Operators
Part 5: Conditional Statements
Part 6: JavaScript Functions
Part 7: Pattern Matching - The RegExp Object
|