Aren't Web Caches bad for me? Why should I help them?
June 21, 1999
Web caching is one of the most misunderstood technologies on the
Internet. Webmasters in particular fear losing control of their
site, because a cache can 'hide' their users from them, making it
difficult to see who's using the site.
Unfortunately for them, even if no Web caches were used, there
are too many variables on the Internet to assure that they'll be
able to get an accurate picture of how users see their site. If
this is a big concern for you, this document will teach you how to
get the statistics you need without making your site
cache-unfriendly.
Another concern is that caches can serve content that is out of
date, or stale. However, this document can show you how to
configure your server to control this, while making it more
cacheable.
On the other hand, if you plan your site well, caches can help
your Web site load faster, and save load on your server and
Internet link. The difference can be dramatic; a site that is
difficult to cache may take several seconds to load, while one that
takes advantage of caching can seem instantaneous in comparison.
Users will appreciate a fast-loading site, and will visit more
often.
Think of it this way; many large Internet companies are spending millions
of dollars setting up farms of servers around the world to replicate their
content, in order to make it as fast to access as possible for their users.
Caches do the same for you, and they're even closer to the end user. Best of
all, you don't have to pay for them.
The fact is that caches will be used whether you like it or not.
If you don't configure your site to be cached correctly, it will be
cached using whatever defaults the cache's administrator decides
upon.
All caches have a set of rules that they use to determine when
to serve an object from the cache, if its available. Some of these
rules are set in the protocols (HTTP 1.0 and 1.1), and some are set
by the administrator of the cache (either the user of the browser
cache, or the proxy administrator).
Generally speaking, these are the most common rules that are
followed for a particular request (don't worry if you don't
understand the details, it will be explained below):
- If the object's headers tell the cache not to keep the object,
it won't. Also, if no validator is present, most caches will mark
the object as uncacheable.
- If the object is authenticated or secure, it won't be
cached.
- A cached object is considered fresh (that is, able to
be sent to a client without checking with the origin server) if:
- It has an expiry time or other age-controlling directive set,
and is still within the fresh period.
- If a browser cache has already seen the object, and has been
set to check once a session.
- If a proxy cache has seen the object recently, and it was
modified relatively long ago.
Fresh documents are served directly from the cache, without
checking with the origin server.
- If an object is stale, the origin server will be asked to
validate the object, or tell the cache whether the copy that
it has is still good.
Together, freshness and validation are the most important ways
that a cache works with content. A fresh object will be available
instantly from the cache, while a validated object will avoid sending
the entire object over again if it hasn't changed.
Caching Tutorial for Web Authors and Webmasters
Caching Tutorial for Web Authors and Webmasters
How (and how not) to Control Caches
|