Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Backreferences (again) - Page 19

April 6, 2001

Finally, in our tour of regular expressions, let's look again at backreferences. Suppose you want to find any repeated words in a string. How would you do it? You might think about doing this:

if (/\b(\w+) $1\b/) { print "Repeated word: $1\n";}

Except, this doesn't work, because $1 is only set when the match is complete. In fact, if you have warnings turned on, you'll be alerted to the fact that $1 is undefined every time. In order to match while still inside the regular expression, you need to use the following syntax:

if (/\b(\w+) \1\b/) { print "Repeated word: $1\n";}

However, when you're replacing, you'll get a warning if you try and use the \ syntax on the wrong side. It'll work, but you'll be told "\1 better written as $1 ".

Summary

Regular expressions are quite possibly the most powerful means at your disposal of looking for patterns in text, extracting sub- patterns and replacing portions of text. They're the basis of any text shuffling you do in Perl, and they should be your first port of call when you need to do some string manipulation.

In this chapter, we've seen how to match simple text, different classes of text, and then different amounts of text. We've also seen how to provide alternative matches, how to refer back to portions of the match, and how to substitute and transliterate text.

The key to learning and understanding regular expressions is to be able to break them down into their component parts and unravel the language, translating it piecewise into English. Once you can fluently read out the intention of a complex regular expression, you're well on your way to creating powerful matches of your own.

You can find a summary of regular expression syntax in Appendix A. Section 6 of the Perl FAQ (at www.perl.com) contains a good selection of regexp hints and tricks.

Exercises

Write out English descriptions of the following regular expressions, and describe what the operations actually do:

$var =~ /(\w+)$/
$code !~ /F:\1-WDVL\in progress\Book Reviews\
  BeginningPerl\Part Five\backref/
s/#{2,}/#/g

[Lines 2 and 3 above are one line. They have been split for formatting purposes.]

Using the contents of the gettysburg.txt file (provided in the download for Chapter 6), use regular expressions to do the following, and print out the result. (Tip: use a here-document to store the text in your file):

Count the number of occurences of the word 'we'.

Reformat the text, so that each sentences is displayed as a separate paragraph.

Check that there are no multiple spaces in the text, replacing any with single spaces.

When we use groups, the // operator returns a list of all the text strings that have been matched. Modify our example program matchtest2.plx, so that it produces its output from this list, rather than using special variables.

If we want to sort a list of words into alphabetical order, one simple and quite effective way is to write a program that performs a 'bubble sort': working through the whole list, it compares each pair of consecutive words; if it finds them in the wrong order, it swaps them over. On reaching the end of the list it repeats the process - unless the previous scan didn't yield any swaps, in which case the list is already properly ordered. Use regular expressions along with the other techniques you've seen so far, and write this program so that it will work with a list of words separated by newline characters. One small hint - the pos() function may come in useful here. You can use this to adjust the position of the \G boundary, for example: pos($var) = 10 will set it just after the tenth character in $var. A subsequent global search will therefore start from this point.

Lookaheads and Lookbehinds - Page 18
Beginning Perl


Up to => Home / Authoring / Languages / Perl / BeginningPerl




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers