Different Log File Formats (con't) - Page 6
December 11, 2001
The entire script as it should look at this point is given in
Example 8-3.
#!/usr/bin/perl -w
# log_report.plx
# report on web visitors
use strict;
my $log_format = 'common'; # 'common' or 'extended'
while (<>) {
my ($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes,
$referer, $agent);
if ($log_format eq 'common') {
($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes) =
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
(\S+)" (\S+) (\S+)$/
or next;
} elsif ($log_format eq 'extended') {
($host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status, $bytes,
$referer, $agent) =
/^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+)\] "(\S+) (.+?)
(\S+)" (\S+) (\S+) "([^"]+)" "([^"]+)"$/
or next;
} else {
die "unrecognized log format '$log_format'";
}
print join "\n", $host, $ident_user, $auth_user, $date, $time,
$time_zone, $method, $url, $protocol, $status,
$bytes, $referer, $agent, "\n";
}
Now we're ready to test the log_report.plx script on
a real log file, to make sure the regular expression is actually
parsing the way we think it should. We set the
$log_format variable to the appropriate value for
our log files, then try using something like this in the shell
(substituting appropriate pathnames as needed) to redirect the
log file into the script's standard input and then pipe the
script's output to more:
[jbc@andros .logs]$ log_report.plx < access.log | more
If our log file has only IP addresses and no hostnames, we can
put the clf_lookup.plx script we created earlier at
the beginning of the pipeline with something like this:
[jbc@andros .logs]$ clf_lookup.plx < access.log | log_report.plx | more
This makes use of a new shell redirection symbol that we haven't
used before, the left angle bracket (<). The
< character tells the shell to redirect the
contents of a file into a command's standard input. If we enter
the command line given here and nothing prints out, we need to
make sure we've got the log file path and filename correct. If
that's not the problem, we need to double-check the configuration
variable to make sure it has the appropriate value for our log
file format (and take a careful look at the log file, too, to
make sure it really is in the common or extended format). If the
script still doesn't output anything, the problem is probably in
our regular expression. We need to make sure it is all on one
line and has the appropriate spacing between the various
elements. As a last resort, we can try shortening it a little bit
at a time (or building it up from nothing a little bit at a
time), getting it so that it successfully matches and captures
something, at least, then adding additional elements until the
whole thing is working.
Different Log File Formats - Page 6
Perl for Web Site Management
|