Skip to content

Commit

Permalink
New deployer (#1284)
Browse files Browse the repository at this point in the history
* initial lambda@edge for content origin-request events

* deployer refactor

* feedbacked

* feedback on upload aws stuff

* more hacks

* feedbacked part 2

Co-authored-by: Peter Bengtsson <[email protected]>
  • Loading branch information
escattone and peterbe authored Sep 24, 2020
1 parent c8de065 commit 34c58de
Show file tree
Hide file tree
Showing 17 changed files with 856 additions and 643 deletions.
7 changes: 7 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[flake8]
# Black recommends 88-char lines and ignoring the following lints:
# - E203 - whitespace before ':'
# - E501 - line too long
# - W503 - line break before binary operator
max-line-length=88
ignore = E203, E501, W503
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ node_modules/
.env.test.local
.env.production.local
/.env
__pycache__/

npm-debug.log*
yarn-debug.log*
Expand All @@ -24,6 +25,7 @@ yarn-error.log*
/testing/node_modules/
/client/build/
/ssr/dist/
/aws-lambda/**/*.zip

# Can be removed once content lives in its own repo
/archivecontent/files
Expand Down Expand Up @@ -68,3 +70,5 @@ yarn-error.log*
/deployer/src/deployer.egg-info/
fake-v1-api/
mdn-yari-*.tgz
lambda/**/*.zip
function.zip
6 changes: 6 additions & 0 deletions content/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ function buildURL(locale, slug) {
return `/${locale}/docs/${slug}`.toLowerCase();
}

/*
* NOTE: A nearly identical copy of this function is used within
* ./lambda/content-origin-request/index.js. If you make a
* change to this function, you must replicate the change
* there as well.
*/
function slugToFolder(slug) {
return (
slug
Expand Down
172 changes: 90 additions & 82 deletions deployer/README.md
Original file line number Diff line number Diff line change
@@ -1,123 +1,131 @@
# deployer
# Deployer

Ship a Yari static site for web hosting.
The Yari deployer does two things. First, it's used to upload document
redirects, pre-built document pages, static files (e.g. JS, CSS, and
image files), and sitemap files into an existing AWS S3 bucket.
Since we serve MDN from a S3 bucket via a CloudFront CDN, this is the
way we upload a new version of the site.

## Limitations and caveats
Second, it is used to update and publish changes to existing AWS Lambda
functions. For example, we use it to update and publish new versions of
a Lambda function that we use to transform incoming document URL's into
their corresponding S3 keys.

- Redirects - in the build directory we're supposed to have
`/en-us/_redirects.txt`

- Without Lambda@Edge in front of the S3 Website some URLs won't map correctly.

- GitHub integration

## How it works
## Getting started

This project's goal is ultimately to take a big directory of files and
upload them to S3. But there are some more advanced features so as
turning `_redirects.txt` files into S3 redirect keys. And there might be
file system names that don't match exactly what we need the S3 key to
be called exactly.
You can install it globally or in a virtualenv environment. Whichever you
prefer.

All deployments, generally, go into the one same S3 bucket. But in that bucket
you always have a "prefix" (aka. a root folder) which gets used by
CloudFront so you can have _N_ CloudFront distributions for 1 S3 bucket.
For example, one prefix might be called `master` which'll be the
production site. Another prefix might be `peterbe-pr12345`.
```sh
cd deployer
poetry install
poetry run deployer --help
```

It might be worth considering having 2 buckets:
Please refer to the [`boto3` documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration)
with regards to configuring AWS access credentials.

- One for production builds
## Uploads

- One for pull request builds
The `poetry run deployer upload` command uploads files as well as redirects
into an existing S3 bucket. As input, it takes a directory which contains the
files that should be uploaded, but it also needs to know where to find
any redirects that should be uploaded. By default it searches for redirects
within the content directories specified by `--content-root` (or
`CONTENT_ROOT`) and `--content-translated-root` (or `CONTENT_TRANSLATED_ROOT`).
It does this by searching for `_redirects.txt` files within those directories,
converting each line in a `_redirects.txt` file into an AWS S3 redirect key.
The files and redirects can be uploaded into the S3 bucket's root, or instead
into a sub-folder of the root (`--folder` option), which is what we do when
uploading experimental versions of the site.

So every deployment has a prefix (aka. the "name") which can be automatically
generated based on the name of the current branch, but if it's known
from the CI environment, even better, then we don't need to ask git.
The first thing it does is that it downloads a complete listing of
every known key in the bucket under that prefix and each key's size.
(That's all you get from `bucket.list_objects_v2`). Now, it starts to
walk the local directory and for each _file_ it applies the following logic:
Currently, we have three main S3 buckets that we upload into: `dev` (for
experimental or possible versions of the site), `stage`, and `prod`.

- Does it S3 key _not_ exist at all? --> Upload brand new S3 key!
- Does the S3 key _exist_?
- Is the file size different from the S3 key size? --> Upload changed S3 key!
- Is the file size exactly the same as the S3 key size? --> Download the
S3 key's `Metadata->filehash`.
- Is the hash exactly the same as the file's hash? --> Do nothing!
- Is the hash different? --> Upload changed S3 key!
When uploading files (not redirects), the deployer is intelligent about
what it uploads. If only uploads files whose content has changed, skipping
the rest. However, since the `cache-control` attribute of a file is not
considered part of its content, if you'd like to change the `cache-control`
from what's in S3, it's important to use the `--force-refresh` option to
ensure that all files are uploaded with fresh `cache-control` attributes.

When it uploads an S3 key, _always_ compute the local file's hash and
include that as a piece of S3 key Metadata.
Redirects are always uploaded.

## Getting started
### Examples

You can install it globally or in a virtualen environment. Whatever floats
float fancy.
```sh
cd deployer
poetry run deployer upload --bucket dev --folder pr1234 ../client/build
```

poetry install
poetry run deployer --help
```sh
cd deployer
poetry run deployer upload --bucket prod ../client/build
```

Please refer to the [`boto3` documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration) with regards to configuring AWS access credentials.
## Updating Lambda Functions

### Actually uploading something
The command:

The sub-command for uploading is called `upload`. You use it like this:
```sh
cd deployer
poetry run deployer update-lambda-functions
```

poetry run deployer upload --help
will discover every folder that contains a Lambda function, create a
deployment package (Zip file) for each one by running:

An example of this is, if you know what you want your bucket to be called
and you know what the folder prefix should be, and you have built the whole
site in `../client/build`:
```sh
yarn make-package
```

poetry run deployer upload --bucket mdn-yari --name pr1234 ../client/build
and if the deployment package is different from what is already in AWS,
it will upload and publish a new version.

## Environment variables

All the options you can specify with the CLI can equally be expressed
as environment variables. You just need to prefix it with `DEPLOYER_` and
write it all upper case.

export DEPLOYER_BUCKET=peterbe-yari
export DEPLOYER_NAME=master
poetry run deployer upload ../client/build

...is the same as...

poetry run deployer upload --bucket peterbe-yari --name master ../client/build

Other things you can set (excluding AWS credentials for `boto3`):

- `AWS_PROFILE` - default: `default`
- `S3_BUCKET_LOCATION` - default: `''`
- `DEPLOYER_MAX_WORKERS_PARALLEL_UPLOADS` - default: `50`
- `DEPLOYER_CACHE_CONTROL` - default: `60 * 60`
- `DEPLOYER_HASHED_CACHE_CONTROL` - default: `60 * 60 * 24 * 365`
- `DEPLOYER_NO_PROGRESSBAR` - default: `false`

## Goal

To be dead-easy to use and powerful at the same time.
The following environment variables are supported.

- `DEPLOYER_BUCKET_NAME` is equivalent to using `--bucket`
- `DEPLOYER_NO_PROGRESSBAR` is equivalent to using `--no-progressbar`
- `DEPLOYER_CACHE_CONTROL` can be used to specify the `cache-control`
header for all non-hashed files that are uploaded (the default is
`3600` or one hour)
- `DEPLOYER_HASHED_CACHE_CONTROL` can be used to specify the `cache-control`
header for all hashed files (e.g., `main.3c12da89.chunk.js`) that are
uploaded (the default is `31536000` or one year)
- `DEPLOYER_MAX_WORKERS_PARALLEL_UPLOADS` controls the number of worker
threads used when uploading (the default is `50`)
- `CONTENT_ROOT` is equivalent to using `--content-root`
- `CONTENT_TRANSLATED_ROOT` is equivalent to using `--content-translated-root`

## Contributing

You need to have [`poetry` installed on your system](https://python-poetry.org/docs/).
Now run:

cd deployer
poetry install
```sh
cd deployer
poetry install
```

That should have installed the CLI:

poetry run deployer
```sh
poetry run deployer
```

If you wanna make a PR, make sure it's formatted with `black` and
passes `flake8`.

You can check that all files are `flake8` fine by running:

flake8 deployer
```sh
flake8 deployer
```

And to check that all files are formatted according to `black` run:

black --check deployer
```sh
black --check deployer
```
54 changes: 54 additions & 0 deletions deployer/aws-lambda/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Yari Lambda Functions

This `aws-lambda` folder contains one or more sub-folders each of which may
define a unique AWS Lambda function used by Yari. A sub-folder defines an AWS
Lambda function if it contains the following:

- an `index.js` file containing the code of the Lambda function
- a `package.json` file which defines the Lambda function's dependencies as
well as a `make-package` command that when run creates an AWS Lambda
deployment package (Zip file) containing the function's code (`index.js`)
and dependencies (`node_modules/*`).

Here's an example `package.json` file:

```json
{
"description": "Defines the deployment package for this AWS Lambda function.",
"private": true,
"main": "index.js",
"license": "MPL-2.0",
"scripts": {
"make-package": "yarn install && zip -r -X function.zip . -i index.js 'node_modules/*'"
},
"dependencies": {
"sanitize-filename": "^1.6.3"
},
"engines": {
"node": "12.x"
},
"aws": {
"name": "mdn-content-origin-request",
"region": "us-east-1"
}
}
```

## Updating Lambda Functions in AWS

The command:

```sh
cd deployer
poetry run deployer update-lambda-functions
```

will discover every folder that contains a Lambda function, create a
deployment package (Zip file) for each one by running:

```sh
yarn make-package
```

and if the deployment package is different from what is already in AWS,
it will upload and publish a new version.
52 changes: 52 additions & 0 deletions deployer/aws-lambda/content-origin-request/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
const sanitizeFilename = require("sanitize-filename");

const CONTENT_DEVELOPMENT_DOMAIN = ".content.dev.mdn.mozit.cloud";

/*
* NOTE: This function is derived from the function of the same name within
* ../../content/utils.js. It differs only in its final "join", which
* uses "/", as required by S3 keys, rather than "path.sep".
*/
function slugToFolder(slug) {
return slug
.replace(/\*/g, "_star_")
.replace(/::/g, "_doublecolon_")
.replace(/:/g, "_colon_")
.replace(/\?/g, "_question_")
.toLowerCase()
.split("/")
.map(sanitizeFilename)
.join("/");
}

exports.handler = async (event, context) => {
/*
* Modify the request before it's passed to the S3 origin.
*/
const request = event.Records[0].cf.request;
const host = request.headers.host[0].value.toLowerCase();
// Rewrite the URI to match the keys in S3.
// NOTE: The incoming URI should remain URI-encoded.
let newURI = slugToFolder(request.uri);
if (newURI.includes("/docs/") && !newURI.endsWith("/index.json")) {
if (!newURI.endsWith("/")) {
newURI += "/";
}
newURI += "index.html";
}
request.uri = newURI;
// Rewrite the HOST header to match the S3 bucket website domain.
// This is required only because we're using S3 as a website, which
// we need in order to do redirects from S3. NOTE: The origin is
// considered a "custom" origin because we're using S3 as a website.
request.headers.host[0].value = request.origin.custom.domainName;
// Conditionally rewrite the path (prefix) of the origin.
if (host.endsWith(CONTENT_DEVELOPMENT_DOMAIN)) {
// When reviewing PR's, each PR gets its own subdomain, and
// all of its content is prefixed with that subdomain in S3.
request.origin.custom.path = `/${host.split(".")[0]}`;
} else {
request.origin.custom.path = "/main";
}
return request;
};
19 changes: 19 additions & 0 deletions deployer/aws-lambda/content-origin-request/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"description": "Defines the deployment package for this AWS Lambda function.",
"private": true,
"main": "index.js",
"license": "MPL-2.0",
"scripts": {
"make-package": "yarn install && zip -r -X function.zip . -i index.js 'node_modules/*'"
},
"dependencies": {
"sanitize-filename": "^1.6.3"
},
"engines": {
"node": ">=12.x"
},
"aws": {
"name": "mdn-content-origin-request",
"region": "us-east-1"
}
}
Loading

0 comments on commit 34c58de

Please sign in to comment.