Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Regular Expressions - Page 2

February 9, 2001

"11:15. Restate my assumptions:
Mathematics is the language of nature.
Everything around us can be represented and understood through numbers.
If you graph these numbers, patterns emerge. Therefore: There are patterns everywhere in nature."
- Max Cohen in Pi, 1998

Whether or not you agree that Max's assumptions give rise to his conclusion is your own opinion, but his case is much easier to follow in the field of computers — there are certainly patterns everywhere in programming.

Regular expressions allow us look for patterns in our data. So far we've been limited to checking a single value against that of a scalar variable or the contents of an array or hash. By using the rules outlined in this chapter, we can use that one single value (or pattern) to describe what we're looking for in more general terms: we can check that every sentence in a file begins with a capital letter and ends with a full stop, find out how many times James Bond's name is mentioned in 'Goldfinger', or learn if there are any repeated sequences of numbers in the decimal representation of p greater than five in length.

However, regular expressions are a very big area — they're one of the most powerful features of Perl. We're going to break our treatment of them up into six sections:

  • Basic patterns
  • Special characters to use
  • Quantifiers, anchors and memorizing patterns
  • Matching, substituting, and transforming text using patterns
  • Backtracking
  • A quick look at some simple pitfalls

Generally speaking, if you want to ask Perl something about a piece of text, regular expressions are going to be your first port of call — however, there's probably one simple question burning in your head.

What Are They?

The term "Regular Expression" (now commonly abbreviated to "RegExp" or even "RE") simply refers to a pattern that follows the rules of syntax outlined in the rest of this chapter. Regular expressions are not limited to Perl — Unix utilities such as sed and egrep use the same notation for finding patterns in text. So why aren't they just called 'search patterns' or something less obscure?

Well, the actual phrase itself originates from the mid-fifties when a mathematician called Stephen Kleene developed a notation for manipulating 'regular sets'. Perl's regular expressions have grown and grown beyond the original notation and have significantly extended the original system, but some of Kleene's notation remains, and the name has stuck.

Patterns

History lessons aside, it's all about identifying patterns in text. So what constitutes a pattern? And how do you compare it against something?

The simplest pattern is a word — a simple sequence of characters — and we may, for example, want to ask Perl whether a certain string contains that word. Now, we can do this with the techniques we have already seen: We want to split the string into separate words, and then test to see if each word is the one we're looking for. Here's how we might do that:

#!/usr/bin/perl
# match1.plx
use warnings;
use strict;
my $found = 0;
$_ = "Nobody wants to hurt you... 'cept,
	I do hurt people sometimes, Case.";
my $sought = "people";
foreach my $word (split) {
if ($word eq $sought) {
$found = 1;
last;
}
}
if ($found) {
print "Hooray! Found the word 'people'\n";
}
[Lines 6 and 7 above are one line. They have been split for formatting purposes.]

Sure enough the program returns success:

>perl match1.plx

Hooray! Found the word 'people'
>

But that's messy! It's complicated, and it's slow to boot! Worse still, the split function (which breaks each of our lines up into a list of 'words' — we'll see more of this, later on in the chapter) actually keeps all the punctuation — the string 'you ' wouldn't be found in the above, whereas 'you... ' would. This looks like a hard problem, but it should be easy. Perl was designed to make easy tasks easy and hard things possible, so there should be a better way to do this. This is how it looks using a regular expression:

#!/usr/bin/perl# match1.plxuse warnings;use strict;
$_ = "Nobody wants to hurt you... 'cept,
	I do hurt people sometimes, Case.";

if ($_ =~ /people/) {
print "Hooray! Found the word 'people'\n";
}
[Lines 2 and 3 above are one line. They have been split for formatting purposes.]

This is much, much easier and yeilds the same result. We place the text we want to find between forward slashes — that's the regular expression part — that's our pattern, what we're trying to match. We also need to tell Perl which particular string we're looking for in that pattern. We do this with the =~ operator. This returns 1 if the pattern match was successful (in our case, whether the character sequence 'people' was found in the string) and the undefined value if it wasn't.

Beginning Perl
Beginning Perl
Checking the Syntax - Page 3


Up to => Home / Authoring / Languages / Perl / BeginningPerl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers