Hosting a Gatsby site on S3 and CloudFront
Apr 23, 2018 1:52:00 PM • Author: Joris Portegies Zwart
Recently, we migrated the Ximedes website from WordPress to Gatsby.js. The resulting static site is hosted at Amazon AWS, using CloudFront backed by an S3 bucket that holds our static content.
Infrastructure and deployment
Our main setup is straightforward. The output of the Gatsby build directory is synced to an S3 bucket, which is configured as the origin server for a CloudFront distribution. (Note that we don't specifically configure the bucket as a web server, just as plain storage.)
While this gets you going rapidly, there are a few issues with this setup that need to be solved before launch.
- By default, CloudFront caches the response of every request for 24 hours. This conflicts with the proper caching strategy for Gatsby sites.
- While CloudFront allows you to specify a default object for requests targeting a directory (typically
index.html
), this only works for the root directory, not for subdirectories. - We take great pride in our secure coding practices, and so we wanted to include all the proper HTTP security headers to our responses. Unfortunately, CloudFront does not support configuring custom response headers.
- Migrating to a new site, with a whole new structure, means a lot of our old URLs don't work anymore. While this isn't a huge issue for us, we still wanted to make sure some basic pages, such as the About and Contact pages, were still reachable under their old URL.
- Finally, requests for non-existing content by default returns a HTTP
403
response. This is an S3 security feature; returning a404
would let the client know that the requested object doesn't exist, which is information you shouldn't know if you don't have sufficient access rights. However, in our case we would like to return a404
to the end user, in line with the HTTP standard.
Introducing AWS Lambda@Edge functions
We solve the first four issues by using AWS Lambda@Edge functions. These are plain functions, written in JavaScript, Python, Java, C# or Go, that can intercept requests and responses to and from your site, and perform custom logic.
CloudFront allows up to four lambda functions to be configured, shown as λ1 through λ4 in the picture above.
- λ1 is invoked on every request from the browser to CloudFront
- λ2 is invoked on every request from CloudFront to the S3 bucket (so only on a cache miss)
- λ3 is invoked on every response from S3 to CloudFront
- λ4 is invoked on every response from CloudFront to the browser
These lambda functions receive either a request or response event as parameter, depending on where in the chain they are configured. The format of these events is documented on the AWS website.
Adding correct cache headers
To make CloudFront implement the proper caching strategy, we first configure it such that instead of caching everything for the default TTL, it should respect Cache-Control
headers as returned by the origin server. Then, we create a lambda function at position λ3 that actually adds these headers to responses from S3 as follows:
'use strict';
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const response = event.Records[0].cf.response;
const headers = response.headers;
if (request.uri.startsWith('/static/')) {
headers['cache-control'] = [
{
key: 'Cache-Control',
value: 'public, max-age=31536000, immutable'
}
];
} else {
headers['cache-control'] = [
{
key: 'Cache-Control',
value: 'public, max-age=0, must-revalidate'
}
];
}
callback(null, response);
};
This boils down to 'cache anything in the directory /static
forever, and always check for newer versions of everything else'. Gatsby V2 will also allow long-term caching of JavaScript files, which will make the cache dramatically more effective. But for now, this is it!
Adding security headers
The OWASP Secure Headers Project provides an excellent summary of HTTP headers that should be used to increase the security of your website. While one of the great benefits of creating a static site is that the risk of attacks is minimized, it is still good practice to include these.
Adding security headers should be done in the same function λ3 we created to add caching headers, which we extend to this:
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const response = event.Records[0].cf.response;
const headers = response.headers;
if (request.uri.startsWith('/static/')) {
headers['cache-control'] = [
{
key: 'Cache-Control',
value: 'public, max-age=31536000, immutable'
}
];
} else {
headers['cache-control'] = [
{
key: 'Cache-Control',
value: 'public, max-age=0, must-revalidate'
}
];
}
[
{
key: 'Strict-Transport-Security',
value: 'max-age=31536000'
},
{
key: 'X-Content-Type-Options',
value: 'nosniff'
},
{
key: 'X-Permitted-Cross-Domain-Policies',
value: 'none'
},
{
key: 'Referrer-Policy',
value: 'no-referrer'
},
{
key: 'X-Frame-Options',
value: 'deny'
},
{
key: 'X-XSS-Protection',
value: '1; mode=block'
},
{
key: 'Content-Security-Policy',
value:
"default-src 'none' ; script-src 'self' 'unsafe-inline'; " +
"style-src 'self' 'unsafe-inline' ; img-src 'self' data:; " +
"font-src 'self' ; manifest-src 'self' ; " +
'upgrade-insecure-requests; block-all-mixed-content; ' +
'report-uri https://ximedes.report-uri.com/r/d/csp/enforce;'
}
].forEach(h => (headers[h.key.toLowerCase()] = [h]));
callback(null, response);
};
It is unfortunate that Gatsby requires the unsafe-inline
directive in the Content-Security-Policy
header for scripts and styles. One of the main purposes of CSP is to mitigate cross-site scripting (XSS) attacks, and disabling the execution of inline scripts goes a long way towards that. However, Gatsby creates large amounts of inline JS and CSS. Some open Github issues (#3758 and #3427) address this, but there doesn't seem to be an easy solution for now.
Also note the directive report-uri https://ximedes.report-uri.com/r/d/csp/enforce;
. This makes the browser report any CSP violations to ReportURI, an easy way to quickly get your CSP reporting up and running. Of course you can also create your own reporting endpoint.
Serving index.html
To address the issue with serving index.html
for directory-level requests, we create another lambda function λ2 to check every request to S3, try and detect if it's requesting a directory, and add index.html
to the URI before proceeding. (Note that the logic here is simple to the point of naive, but so far it has worked without issues.)
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const uri = request.uri;
if (uri.endsWith('/')) {
request.uri += 'index.html';
} else if (!uri.includes('.')) {
request.uri += '/index.html';
}
callback(null, request);
};
Redirecting legacy URLs
To make sure old links and outdated search results to our About and Contact pages would still work, we extend λ2 to detect requests for the old paths (/contact
and /about-ximedes
respectively), and instead of passing the request onto S3 return a 301
response redirecting the user to the new locations (/contact-us/
and /about-us/
).
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const uri = request.uri;
// Redirect some popular search results to their new pages
const redirects = [
{ test: /^\/contact\/?$/g, targetURI: '/contact-us/' },
{ test: /^\/about-ximedes\/?$/g, targetURI: '/about-us/' }
];
const redirect = redirects.find(r => uri.match(r.test));
if (redirect) {
const response = {
status: '301',
statusDescription: 'Moved Permanently',
headers: {
location: [
{
key: 'Location',
value: 'https://www.ximedes.com' + redirect.targetURI
}
]
}
};
callback(null, response);
return;
}
// Make sure directory requests serve index.html
if (uri.endsWith('/')) {
request.uri += 'index.html';
} else if (!uri.includes('.')) {
request.uri += '/index.html';
}
callback(null, request);
};
Dealing with 404s
Finally, the issue of returning a 404
error code after S3 returns 403
is solved by properly configuring CloudFront. With our 404 page being the Gatsby default /404.html
, we add the following custom error response in the Error Pages tab of your CloudFront distribution.
And there we are - a Gatsby site running in a serverless AWS infrastructure, with proper caching, error codes and directory root objects.