Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions
 Discussion Forums
 HTML, XML, JavaScript...
 Software Reviews
 Editors,Others...
 Top100
 JavaScript Tutorials, ...
 Tutorials
 ASP, CSS, Databases...
 Discussion List
 FAQ, Roundup, Configure ...
 Authoring
 HTML, JavaScript, CSS...
 Design
 Layout, Navigation,...
 Graphics
 Tools, Colors, Images...
 Software
 Browsers, Editors, XML...
 Internet
 Domains, E-Commerce, ...
 WDVL Resources
  Intermdiate, Tutorials,...
 WDVL
 Discussion Lists, Top 100,...
 Technology Jobs


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Top 10 Articles
  1. Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions
  2. JavaScript Tutorial for Programmers
  3. Design
  4. JavaScript Tutorial for Programmers - Objects
  5. JavaScript Tutorial for Programmers - JavaScript Grammar
  6. JavaScript Tutorial for Programmers - Versions of JavaScript
  7. Cascading Style Sheets
  8. JavaScript Tutorial for Programmers - Embedding JavaScript
  9. JavaScript Tutorial for Programmers - Functions
  10. Authoring JavaScript
Domain Name Lookup
Search to find the availability of a domain name. Just enter the complete domain name with extension (.com, .net, .edu)

The "Visit" Data Structure - Page 8

December 14, 2001

Trying to track individual visitors via the entries in a web server's access log is something of an exercise in futility. With things like proxy servers and client-side caching getting in the way, the series of accesses that show up in the log from a particular hostname or IP address can give only an approximate picture of what individual visitors are doing. Multiple users sharing the same IP address can have their activity merged into what looks like a single, very active visitor. Conversely, a single visitor can show up in the logs via a different IP address on each request, defying efforts to abstract those requests into a meaningful "visit." A proxy server at a major ISP can cache the site's pages, then satisfy hundreds of requests that never get recorded in the server's logs. Even so, it's hard not to wonder what a log file would reveal if we could pluck out the requests corresponding to specific hosts and string them together to see what patterns emerge. Many users still browse from individual host addresses without intervening proxy servers; for these users, at least, the resulting "visit" tracking provides a fascinating look at the paths being followed through the site. It's also interesting to see how many incoming requests are actually being generated by robot "spider" programs, and to study the behavior of those programs as they interact with the server. Finally, it's an interesting programming exercise to see how we can assemble and present information on these "visits." As with the data structure we used to create the SprocketExpo exhibitor directory in Chapters and , we could really benefit in this case by taking advantage of Perl's support for multilevel data structures. A hash of hashes (that is, a hash whose values are themselves hash variables) would make the task of storing and accessing information on these visits significantly easier. As it is, though, we won't be learning how to use multilevel data structures for several more chapters. That's okay; we can fake it by using the conventional variables we've been using already, just as we did for the SprocketExpo example. For the purposes of this script, we're going to define a "visit" as a series of one or more requests received from the same host, with no more than 15 minutes elapsing between one request and the next. If we get another request from the same host but more than 15 minutes has elapsed since the last one, we will treat the new request as the start of a new "visit," counting it separately in our statistics. We may as well make that 15-minute visit timeout a configuration variable up at the top of the script and store it in seconds to make our computations easier:

my $expire_time   = 900; # seconds of inactivity to consider a
                         # "visit" ended (0 = forever)

Notice how the comment tells us we can set the $expire_time variable to 0 to make the expiration time "forever." We'll see how this works in a minute. A number of other variables, visible throughout the script and declared with my near the beginning, will be used to store the information on individual visits:

$total_visits This scalar will be incremented by one for each new visit processed. Besides being used in the script's report to tell us how many visits there were in all, this count will be used to generate a unique visit number for each visit.

%visit_num This hash will have keys consisting of hostnames or IP addresses, and values consisting of the currently "working" visit number corresponding to that host.

All of the following hash variables will have keys consisting of the visit number described previously:

%host Key is visit number, value is the hostname or IP address corresponding to that visit number.

%first_time Key is visit number, value is the date and time of that visit's first access.

%last_time Key is visit number, value is the date and time of that visit's last (that is to say, most recent) access.

%last_seconds Key is visit number, value is the number of seconds returned by the &get_seconds subroutine for the date and time of that visit's last access.

%referer Key is visit number, value is the HTTP_REFERER environment variable supplied for that visit's first access.

%agent Key is visit number, value is the user- agent string supplied for that visit's first access.

We'll add all these new variables to the big my declaration up at the top of the script:

my($begin_time, $end_time, $total_hits, $total_mb, $total_views,
  $total_visits, %visit_num, %host, %first_time, %last_time,
  %last_seconds, %page_sequence, %referer, %agent);

Storing the Data - Page 7
Perl for Web Site Management
The &store_line Subroutine - Page 9


Up to => Home / Authoring / Languages / Perl / Manage




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers