Escaping Special Characters - Page 5
February 23, 2001
Of course, regular expressions can be more than just words and
spaces. The rest of this chapter is going to be about the various
ways we can specify more advanced matches - where portions of the
match are allowed to be one of a number of characters, or where
the match must occur at a certain position in the string. To do
this, we'll be describing the special meanings given to certain
characters - called metacharacters -and look at what these
meanings are and what sort of things we can express with them.
At this stage, we might not want to use their special meanings -
we may want to literally match the characters themselves. As
you've already seen with double-quoted strings, we can use a
backslash to escape these characters' special meanings. Hence, if
you want to match '...' in the above text, you need
your pattern to say '\.\.\.'. For example:
> perl matchtest.plx
Enter some text to find: Ent+
The text matches the pattern 'Ent+'.
> perl matchtest.plx
Enter some text to find: Ent\+
'Ent\+' was not found.
We'll see later why the first one matched - due to the special
meaning of +.
These are the characters that are given special meaning within a
regular expression, which you will need to backslash if you want
to use literally:. * ? + [ ] ( ) { } ^ $ | \ Any
other characters automatically assume their literal meanings.
You can also turn off the special meanings using the escape
sequence \Q. After Perl sees \Q, the 14
special characters above will automatically assume their
ordinary, literal meanings. This remains the case until Perl sees
either \E or the end of the pattern.
For instance, if we wanted to adapt our
matchtestprogram just to look for literal strings,
instead of regular expressions, we could change it to look like
this:
if (/\Q$pattern\E/) {
Now the meaning of + is turned off:
> perl matchtest.plx
Enter some text to find: Ent+
'Ent+' was not found.
>
Note that all \Q does is turn off the regular
expression magic of those 14 characters above - it doesn't stop,
for example, variable interpolation.
Don't forget to change this back again: We'll be using
matchtest.plx throughout the chapter, to demonstrate
the regular expressions we look at. We'll need that magic fully
functional!
Anchors
So far, our patterns have all tried to find a match anywhere in
the string. The first way we'll extend our regular expressions is
by dictating to Perl where the match must occur. We can say
'these characters must match the beginning of the string' or
'this text must be at the end of the string'. We do this by
anchoring the match to either end.
The two anchors we have are ^, which appears
at the beginning of the pattern anchor a match to the beginning
of the string, and $ which appears at
the end of the pattern and anchors it to the end of the string.
So, to see if our quotation ends in a full stop - and remember
that the full stop is a special character - we say something like
this:
>perl matchtest.plx
Enter some text to find: \.$
The text matches the pattern '\.$'.
That's a full stop (which we've escaped to prevent it being
treated as a special character) and a dollar sign at the end of
our pattern - to show that this must be the end of the
string.
Try, if you can, to get into the habit of reading out regular
expressions in English. Break them into pieces and say what each
piece does. Also remember to say that each piece must immediately
follow the other in the string in order to match. For instance,
the above could be read 'match a full stop immediately followed
by the end of the string'.
If you can get into this habit, you'll find that reading and
understanding regular expressions becomes a lot easier, and
you'll be able to 'translate' back into Perl more naturally as
well.
Here's another example: do we have a capital I at the beginning
of the string?
> perl matchtest.plx
Enter some text to find: ^I
'^I' was not found.
>
We use ^ to mean 'beginning of the string', followed
by an I. In our case, though, the character at the beginning of
the string is a ", so our pattern does not match. If
you know that what you're looking for can only occur at the
beginning or the end of the match, it's extremely efficient to
use anchors. Instead of searching through the whole string to see
whether the match succeeded, Perl only needs to look at a small
portion and can give up immediately if even the first character
does not match.
Interpolation - Page 4
Beginning Perl
Try it out: Rhyming Dictionary - Page 6
|