Optimizing web performance with HTTP cache
data:image/s3,"s3://crabby-images/5455c/5455c1a1d992c03b8c2d8562bb55f3d217d70aeb" alt="How HTTP cache works"
Speed is one of the most important factors that is considered by users when surfing and visiting a website. So, it is of high priority to improve web performance using different techniques such as resource hints (preconnect, preload, or prefetch) and caching. HTTP cache is a necessary and also practical method with which a website’s performance is considerably enhanced. In this article, we will have a deeper look into the HTTP cache concept, how it works, and how to use it for better website performance optimization.
What is HTTP cache?
The HTTP cache is a mechanism that stores responses to resource requests (like HTML pages, CSS files, JavaScript files, images, or API responses) in a closer location to users such as browsers, proxies, or CDNs, then reuses them when the same requests are made.
For instance, when a user visits a page, if there is no HTTP cache available on the browser’s local storage, requests are sent to the origin server for all the page resources. Then, based on the HTTP response header, some resources are cached on the browser’s local storage. So, on the next visit, all resources except those that have already been cached will be requested from the server.
In other words, by using HTTP cache, no extra requests are made to the origin server based on the validity of those resources that have already been stored. Therefore, both the server traffic and users’ traffic would decrease while fast responses to the users’ requests are provided by the browser based on the stored cache.
There are two general HTTP cache types:
1. Private cache
The Private cache also known as client-side cache, which this article aims to talk about, is the browser cache that stores responses to requests that are made by a user. These responses are not shared with others, so they can be personalized for a specific user.
If the private responses are stored in a cache other than a private cache, they can be accessed by other users which may cause information leakage.
2. Shared cache
The Shared cache, also known as the public cache is a cache between the server and the client. It stores the responses that can be shared with other clients and they can be used by them. This cache type has also two types:
2.1. Proxy cache
One of the shared cache types is the proxy cache. Some proxies in addition to controlling access to the content, use caching to reduce traffic on the network. This caching should be controlled by adding appropriate HTTP headers as service providers do not manage it.
2.2. Managed caches
The managed cache is another shared cache that is provided by the service providers. A popular example of a managed cache is CDNs. The goal of managed cache is to reduce loading on the origin server and deliver content immediately. In most cases, it is possible to control cache behavior and decide about stored responses (whether you want to keep them or delete them for storing new responses) through provided dashboards.
How does HTTP cache work?
HTTP requests, when they are made by the browser, are checked first from the browser cache to see whether the same response exists or not. If the match response to the request exists in the browser cache, it will be returned to the user, but if any match response cannot be found, the request will be sent to the origin server. The HTTP cache is controlled by the request header and response header, and the ideal form is to have control over both the web application code which is specified in the request headers, and the web server’s configuration which is specified in the response headers.
Web application configuration (request headers) for HTTP caching
To configure HTTP caching in the request header, various important headers should be included in the web application’s outgoing requests, but they can be handled by the browser without any need to do them yourself. Therefore, you can stick with the default request headers configuration. Request headers like If-None-Match and If-Modified-Since are HTTP conditional request headers that are used to validate cached resources. They are responsible for checking whether a cached resource has changed from the last time it was cached or not. If there is no change in the resource, the browser does not send the full request to the origin server to handle it.
Web server configuration (response header) for HTTP caching
The headers that the web server adds to the outgoing responses are the most important part of HTTP caching configuration. Although many web servers have built-in support to set up necessary headers, there are also some web servers that need to be configured completely. Cache-Control is one of the most important headers that impact HTTP caching behavior. It contains a set of rules that tell the browser how a resource should be cached and how long it can be reused without checking the server.
data:image/s3,"s3://crabby-images/b2ad5/b2ad56dbedb46050e0cfc5085763cbb073949af9" alt="Cache-Control response header"
According to the Web Almanac report, 60.7% of websites use Cache-Control: max-age directive.
data:image/s3,"s3://crabby-images/75172/751727f9b9dfdc3d8d8f3a330cdba1a5b8d657aa" alt="Cache-Control directives usage bar chart"
Which response header values for HTTP cache should be used?
There are important conditions that need to be considered when configuring a web server’s response header for using Cache-Control directives that are explained below:
Long cache TTL for versioned URLs
Consider that there is a file (e.g. CSS file) that is cached by the browser and has a fixed path, but recently, you have made a change(s) to the file. In this case, how does the browser find out that the cached resources need to be updated? By adding a version to the file URL or fingerprint, the browser finds the recently updated file as a new file and downloads it. It is necessary to add Cache-Control: max-age=31536000 to the responses of the requested URLs that contain versions, fingerprints, or are not going to be changed. This will tell the browser to use the cached version of the resource over the year without any need to request the resource from the origin server.
// Versioned URL example using parameter:
https://example.com/assets/style.css?ver=5.1
// Versioned URL example using random path/file name:
https://example.com/assets/g5oiclok/c7iz8.css
The Cache-Control: max-age= 31536000 is the maximum supported value for cached resources validity.
Cache-Control: max-age=31536000
Server revalidation for unversioned URLs
There is also another condition in which URLs are not versioned or do not have fingerprints. It is completely normal to have such URLs as it is not possible to always provide URLs that contain extra information (version or fingerprint) which specify that the new version of the resource needs to be requested and downloaded by the browser. For example, HTML website pages are not a good case to be versioned or contain fingerprints as it is hard for the users to type or remember them. In this case, there are some Cache-Control values with which you can inform the browser how to treat the resources and you can also ensure the server requests are as efficient as possible.
Here are efficient Cache-Control values for how to treat unversioned URLs in serving cached versions or considering them as new resources:
- no-cache: By using the no-cache value in the Cache-Control header, the browser must revalidate the resource content with the server every time before showing the cached version of the resource.
- no-store: This value determines that the browser and other middle (intermediate) caches such as CDNs do not store cached versions of the resource.
- private: Browsers are allowed to cache resources but intermediate caches like CDNs should not cache them.
- public: The resources can be cached by any cache.
Cache-Control: no-cache, must-revalidate, max-age=0, no-store, private
Does bfcache use an HTTP cache?
No, bfcache (Back/Forward Cache) is different from HTTP cache and it is used to speed up back and forward navigations by immediately loading a page. The bfcache stores a snapshot of the entire page including JavaScript heap, while the HTTP cache only stores individual resources like images and scripts from previous requests. In addition, bfcache is only used when navigating back or forward in the browser history, but the HTTP cache is used for any subsequent page loads where cached resources are available. Furthermore, the cached resources by HTTP cache are stored on the user’s device memory, this is while the cached back and forward pages are stored on the browser’s memory.
If Cache-Control: no-store is set in the HTTP response header, browsers do not cache pages in bfcache, so the bfcache does not work.
Do prefetch and preload speculation APIs use HTTP cache?
Yes, both prefetch and preload use the browser’s HTTP cache. When a preload or prefetch request is made, if the requested resource(s) has been already cached, the browser will server it directly from the cache to avoid a network request.
Does private prefetch proxy use HTTP cache?
No, private prefetch proxy prefetches resources even the cached version of the resources already exists. Prefetch requests, which are made through the Private Prefetch Proxy, avoid sending conditional headers (like If-Modified-Since or ETag). This prevents the server from learning about the user’s existing cache state.
How to add a client-side cache to the resources?
There are different ways to add client-side cache to the resources. One of the possibilities is to add cache-related headers to the resources in the backend through the server’s configurations. As an example, if you use the Apache web server and mod_headers has already been installed, you can set Cache-Control for by adding the rule below in the .htaccess:
<ifModule mod_headers.c>
Header set Cache-Control "max-age=0, no-cache, must-revalidate"
Header set Pragma "no-cache"
Header set Expires "Mon, 29 Oct 1923 20:30:00 GMT"
</ifModule>
Be careful about everything that is added to the .htaccess file. By adding any wrong rule, or if the mod_headers is NOT installed, the response header may not work or even the website may go out of access.
To add HTTP cache to the response headers, you can find the necessary information below based on the web server you use:
Conclusion
HTTP cache is an effective way to improve page loading performance, while reducing the unnecessary requests to the origin server. HTTP cache is supported in all browsers and by using Cache-Control, you can configure and manage HTTP cache in response header.
📚Resources: