The other day I was trying to understand about different HTTP Cache headers and how it’s working. The answers I found was varied for different people. So I spend some hours reading some various articles and made this one. Correct me if I am wrong. Most of the ideas I got is from below article which is an archived blog.
Let’s say you hosted a website called ourfamily.com and you created the site in
Now, what’s cache had to do with all this? The internet is slow that the browser and server have to talk to each other about everything and it can’t take that much time and bandwidth. So for most of this talking, if the browser already knows the answer, it shouldn’t ask the question to the server. Just like, whenever you refresh the family photo page, the browser doesn’t have to wait or ask the server for the same photo to be sent through the network.
In this big world, when browser and server sitting on two different continents, this talking is more expensive. And if the number of pictures is high and everyone may want the same picture, like in the case of a newspaper, talking cannot be this much expensive. So people installed some small intermediate servers. The browser will ask for information to these proxies which will get the information from the original server first, and whenever the second request comes, maybe from same browser or another browser for the same information, they will give the information instantly without going back to the original servers. This added an extra speed to the communication between servers and browsers.
This architecture has some security issues also. These Proxy Caches aren’t supposed to cache everything browser and server talking because they may be passing some sensitive information also, like username and passwords. These talking are based on the HTTP and we could control caching by setting some headers on the HTTP requests and responses.
Pragma: no-cachein HTTP 1.0 requests (Not response). This is for
telling intermediate cache ‘not to cache this request’ maybe this communication
contains sensitive information like username and passwords or caching this had
no advantage for a proxy-cache.
Another HTTP 1.0 header to tell browsers when the page will be expired. This is a response directive. ie the server set this header in the response. Basically, by this header, browser can know that how long this information is intact. Browser and Proxy caches can cache it in their memory.
Then HTTP 1.1 came out in 1997 with upgraded headers.
Let’s say you have a public page, like a login page. or a public resource, like family logo which can be stored by any cache, browser or not. So most of the time people try to get the resource will hit the cache and won’t direct hit the server and we get a performance boost.
cache-control: private tells the proxies not to cache. Like your family photo
page of site ourfamily.com. You wouldn’t want anyone else to see this photo
but your browser can cache this photo for performance and save bandwidth and
Now let’s say you changed the login page design at the time of the New year. The
users should be seeing new page instead of old one. But you already set the page
as public and the browser’ll only show that instead of your new one. Now we’ll
have the most badly named spec (I think) in cache directives to rescue. We could
cache-control: public; no-cache; for the login page. no-cache will tell
the browser that ‘Hey, you can cache this page, but before you show this to the
user, just let the server know. If server changed the page server will give you
the updated one.' Now we can save bandwidth instead of downloading the new page
This naming is so bad that some browsers even start to implement cache
architecture that if
no-cache is present, they won’t use the cache at all,
instead, they will download a fresh copy from server every time.
Now you want a completely no-cached page. Like some page where your family
business revenue is shown And you know
no-cache is not the answer.
to the rescue. It tells the caches (both browser and other caches) not only “not
to cache the page”, but also “not to even store the page in its cache folder”.
So it’s safe to use
cache-control: private,no-cache,no-store for highly
sensitive and fresh pages.
Now we want to cache the background image of the site for 1 week. We’re not
going to change it so often and even if you change it, it’s okay some user see
the old background for some time. Because it’s not important. We could add the
max-age in the header.
max-age is like the old
expires header telling the
browser, ‘Hey you can cache this until the max-age is expired, after you’ve to
revalidate the asset from server.’ So what’ll happen you have a
header? It’ll revalidate every time with server for each request. Pretty much
expiresis better. Because
max-ageis relative time(How long the cache should stay) where
expiresis an absolute one which will take a date as the value, when cache should be stale. (It’s hard to set the date because the time should be same on server and client and we should keep updating this date more often.)
This is all in theory. Different browsers and proxy servers can implement or
honour this headers or not do anything. Just like
no-cache for some browsers.
And some browsers may show stale responses if the network is down and all. If
you want to revalidate the cache every time no matter what, we could achieve
that with an extra header value called
must-revalidate . This will tell the
browser in any circumstances browser should revalidate the resource. There’s a
proxy-revalidate value too but for proxy servers.
Now, we’ve talked about re-validating the resource by
contacting server. How will the browser know that the resource is still valid?
We’ve two ways to do that.
Last-Modified headers are used as these
validators. Every resource are send with these values from the server. And when
a browser wants to validate whether this is fresh/stale it’ll send a request
if-modified-since with a value of
last-modified OR a header named
if-none-match with an
etag value. And if those matches, the browser could
tell that resource is still not changed and can continue to use that.
I think I kind of explained these things. I got the information from different articles and talked about here. Please let me know if I am incorrect or things have been changed over time. Thank you.