How (and how not) to Control Caches
June 21, 1999
There are several tools that Web designers and Webmasters can
use to fine-tune how caches will treat their sites. It may require
getting your hands a little dirty with the server configuration,
but the results are worth it. For details on how to use these tools
with your server, see the
Implementation
sections below.
HTML authors can put tags in a document's <HEAD> section
that describe its attributes. These Meta tags are often
used in the belief that they can mark a document as uncacheable, or
expire it at a certain time.
Meta tags are easy to use, but aren't very effective. That's
because they're usually only honored by browser caches (which
actually read the HTML), not proxy caches (which almost never read
the HTML in the document). While it may be tempting to slap a
Pragma: no-cache meta tag on a home page, it won't necessarily
cause it to be kept fresh, if it goes through a shared cache.
On the other hand, true HTTP headers give you a lot of
control over how both browser caches and proxies handle your
objects. They can't be seen in the HTML, and are usually
automatically generated by the Web server. However, you can control
them to some degree, depending on the server you use. In the
following sections, you'll see what HTTP headers are interesting,
and how to apply them to your site.
- If your site is hosted at an ISP or hosting farm and they don't
give you the ability to set arbitrary HTTP headers (like Expires
and Cache-Control), complain loudly; these are tools necessary for
doing your job.
HTTP headers are sent by the server before the HTML, and only
seen by the browser and any intermediate caches. Typical HTTP 1.1
response headers might look like this:
HTTP/1.1 200 OK
Date: Fri, 30 Oct 1998 13:19:41 GMT
Server: Apache/1.3.3 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT
ETag: "3e86-410-3596fbbc"
Content-Length: 1040
Content-Type: text/html
The HTML document would follow these headers, separated by a
blank line.
Many people believe that assigning a Pragma: no-cache HTTP
header to an object will make it uncacheable. This is not
necessarily true; the HTTP specification does not set any
guidelines for Pragma response headers; instead, Pragma request
headers (the headers that a browser sends to a server) are
discussed. Although a few caches may honor this header, the
majority won't, and it won't have any effect. Use the headers below
instead.
The Expires HTTP header is the basic means of controlling
caches; it tells all caches how long the object is fresh for; after
that time, caches will always check back with the origin server to
see if a document is changed. Expires headers are supported by
practically every client.
Most Web servers allow you to set Expires response headers in a
number of ways. Commonly, they will allow setting an absolute time
to expire, a time based on the last time that the client saw the
object (last access time), or a time based on the last
time the document changed on your server (last modification
time).
Expires headers are especially good for making static images
(like navigation bars and buttons) cacheable. Because they don't
change much, you can set extremely long expiry time on them, making
your site appear much more responsive to your users. They're also
useful for controlling caching of a page that is regularly changed.
For instance, if you update a news page once a day at 6am, you can
set the object to expire at that time, so caches will know when to
get a fresh copy, without users having to hit 'reload'.
The only value valid in an Expires header is a
HTTP date; anything else will most likely be interpreted as 'in the
past', so that the object is uncacheable. Also, remember that the
time in a HTTP date is Greenwich Mean Time (GMT), not local
time.
For example:
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Aren't Web Caches bad for me? Why should I help them?
Caching Tutorial for Web Authors and Webmasters
How (and how not) to Control Caches
|