Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions
 Discussion Forums
 HTML, XML, JavaScript...
 Software Reviews
 Editors,Others...
 Top100
 JavaScript Tutorials, ...
 Tutorials
 ASP, CSS, Databases...
 Discussion List
 FAQ, Roundup, Configure ...
 Authoring
 HTML, JavaScript, CSS...
 Design
 Layout, Navigation,...
 Graphics
 Tools, Colors, Images...
 Software
 Browsers, Editors, XML...
 Internet
 Domains, E-Commerce, ...
 WDVL Resources
  Intermdiate, Tutorials,...
 WDVL
 Discussion Lists, Top 100,...
 Technology Jobs


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Top 10 Articles
  1. Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions
  2. JavaScript Tutorial for Programmers
  3. Design
  4. JavaScript Tutorial for Programmers - Objects
  5. JavaScript Tutorial for Programmers - JavaScript Grammar
  6. JavaScript Tutorial for Programmers - Versions of JavaScript
  7. Cascading Style Sheets
  8. JavaScript Tutorial for Programmers - Embedding JavaScript
  9. JavaScript Tutorial for Programmers - Functions
  10. Authoring JavaScript
Domain Name Lookup
Search to find the availability of a domain name. Just enter the complete domain name with extension (.com, .net, .edu)

Processing Text with Perl Functions - Page 6

August 29, 2001

In this article, we will learn how to effectively leverage Perl's built-in text handling functions to process CSV files and perform an e-mail merge.

Review

In the last article, we learned that Perl is a great language for text processing. We established that there are three different mechanisms for processing text. They are regular expressions, built-in functions, and loadable modules. We learned how to find and replace strings within files with a simple but powerful recursive script.

The benefits of processing text with Perl functions

As you'll learn in this article, Perl includes many useful built- in functions for processing text that will reduce the amount of code that you have to write. Other languages include basic token processing features, but do not go to the trouble of embedding a set of common processing routines directly into the language. Some would argue that defining a set of common routines violates language purity. That may be true in some circles of thought, but I would argue that adding these features have allowed Perl programmers to focus more on getting real work done rather than getting buried in the intricacies of language semantics. Of course, the benefit of having a common set of routines is that, well, you have a common set of routines that other Perl programmers will understand and expect in your programs. This makes it much easier to decipher the text processing portions of code written by other programmers as opposed to having to decipher the details of individual routines that different programmers use in their code to do basically the same thing. Again, this is why Perl is an ideal language for text processing. It comes with all of the text processing capabilities that you will probably ever need right out of the box.

Parsing a CSV file

One of the most common text processing operations that I've performed over the years is reading in and writing out delimited files. These files may come from Excel spreadsheets, databases, system logs, and countless other sources.

Users often have the need to move data out of one program or database and into another. As a programmer, I usually ask for a CSV or comma delimited file as input and write a program to import the data into the second application or database. Often the process will be automated and happen on a routine basis.

The simplest way to process a delimited file is by using the Perl split() function which takes the delimiter and the string to process as arguments.

To demonstrate, I've created a spreadsheet in Excel and saved it as a CSV file using a comma as a delimiter and double quotes to surround the text fields. The contents are below:

"Name","Email","Phone","City","State","Zip"
"Jonathan Eisenzopf","eisen@pobox.com", "703-555-1212","Reston","VA",20191
"John Bigboote","bigboote@yoyodyne.com", "703-555-1213","Fairfax","VA",20814

Given the comma as our delimiter, the syntax of the split() call will look like the following where the $line variable contains the delimited line:

my @list = split(/,/,$line);

If the value of $line contained the first line of the CSV file listed above, split() would assign the following values to the @list array:

"Name"
"Email"
"Phone"
"City"
"State"
"Zip"

Notice that the first argument of the split() function call includes the delimiter, a comma, but is also surrounded by forward slashes. That's because the syntax of the first argument is actually the match operator, a.k.a. a regular expression. This feature can be useful if the data files contain different delimiters.

Something else that you'll probably want to do is get rid of the beginning and trailing double quote character that surrounds the contents of each field. We can do this by adding a statement right after the split() that loops over each item of the @list array and removes the quotes:

s/^"|"$//g foreach @list;

You might remember the search and replace operator from the last article. The foreach operator passes each item of the array to the search and replace operator, which removes the double quote character if it exists at the beginning or ending of the string.

In case the regular expression looks a bit confusing, let's examine it piece by piece. The caret character followed by a double quote tells the search and replace operator (the s character in front of the forward slash) to find a double quote at the beginning of the string. The pipe character is an OR operator. It's followed by another double quote and a dollar sign which mean "match a double quote at the end of the string". The next forward slash closes the regular expression that we're matching. Between the second and third forward slash, we place the characters that we want to replace the matched characters with. In this case, we want to actually remove the characters so we don't put anything between them. We also have the global modifier (g) tacked on the the end of the expression which means, "match it as many times as you can."

So the who expression reads, "At the beginning of the string find a double quote or find a double quote at the end of the string and keep searching the string until you find all of them."

Conclusion - Page 5
Weaving Magic With Regular Expressions
Performing a Mail Merge - Page 7


Up to => Home / Authoring / Languages / Perl / Weave




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers