Converting IP Addresses - Page 2
December 4, 2001
Before we jump into log-file analysis, let's return briefly to
the problem of doing hostname lookups on the IP addresses that
most likely comprise the "host" entries in our web
access logs. Example 8-1 gives a script,
clf_lookup.plx, that does just that. (Like all the
examples in this book, it is available for download from the
book's web site, at
http://www.elanus.net/book/.)
Example 8-1: A script to do hostname lookups on IP addresses
in web access logs
#!/usr/bin/perl -w
# clf_lookup.plx
# given common or extended-format web logs on STDIN, outputs
# them with numeric IP addresses in the first (host) field
# converted to hostnames (where possible).
use strict;
use Socket;
my %hostname;
while (<>) {
my $line = $_;
my($host, $rest) = split / /, $line, 2;
if ($host =~ /^\d+\.\d+\.\d+\.\d+$/) {
# looks vaguely like an IP address
unless (exists $hostname{$host}) {
# no key, so haven't processed this IP before
$hostname{$host} = gethostbyaddr(inet_aton($host), AF_INET);
}
if ($hostname{$host}) {
# only processes IPs with successful lookups
$line = "$hostname{$host} $rest";
}
}
print $line;
}
The script itself is pretty simple, but it introduces some new
concepts that are definitely worth learning about. The first new
thing is this line:
use Socket;
Here we are importing a module called Socket.pm.
Just as we did earlier, when we pulled in the CGI.pm
module, we're doing this in order to let some more experienced
programmers do our dirty work for us. Specifically, the use
Socket declaration in this script means we'll be able to
do DNS lookups (converting numeric IP addresses to
hostnames) using just a few lines of code.
Thousands of Perl modules are available. Some are distributed as
part of the Perl language itself; these are usually referred to
as being in the standard distribution, or as the
standard module Walnuts.pm.
(CGI.pm and Socket.pm are in the
standard distribution.) Others can be found at CPAN, the
Comprehensive Perl Archive Network, which we'll be learning more
about in Chapter 11. If you can't wait until then, though (which
I can totally understand, CPAN being something like the world's
biggest toy store for a Perl programmer), see the accompanying
sidebar, "Using
CPAN," for details on how you can jump the gun and start
exploring CPAN on your own.
Using CPAN
CPAN, the Comprehensive Perl Archive Network, is the official
place to (among other things) get Perl modules that are not
included in the standard distribution (that is, that are not
distributed automatically along with all recent versions of the
language). The hardest part about dealing with CPAN, at least for
a beginning programmer, is that it is so extensive. With user
contributions from all over the world, it has grown like kudzu,
spreading organically in all directions, defying efforts to
organize its contents usefully for anyone unwilling to spend a
significant amount of time studying it.
Of course, if you are spending much time at all programming with
Perl, the time spent learning what's in CPAN will be repaid many
times over by the time you save using other people's code to
perform common tasks rather than reinventing the wheel.
In any event, the following resources will help you get started
with CPAN:
-
http://www.cpan.org/README.html
- The top-level overview of what's in CPAN, with links to more-
specific starting points
-
http://www.cpan.org/modules/
- The top-level page within the modules portion of CPAN, with
pointers to various views of the modules
-
http://www.cpan.org/modules/00modlist.long.html
- A long, annotated list of all the modules in CPAN
- http://search.cpan.org/
- The CPAN search engine
Parsing Web Access Logs
Perl for Web Site Management
Converting IP Addresses (con't) - Page 3
|