New deployer (#1284)

* initial lambda@edge for content origin-request events * deployer refactor * feedbacked * feedback on upload aws stuff * more hacks * feedbacked part 2 Co-authored-by: Peter Bengtsson <[email protected]>
mdn · Sep 24, 2020 · 34c58de · 34c58de
1 parent c8de065
commit 34c58de
Show file tree

Hide file tree

Showing 17 changed files with 856 additions and 643 deletions.
diff --git a/.flake8 b/.flake8
@@ -0,0 +1,7 @@
+[flake8]
+# Black recommends 88-char lines and ignoring the following lints:
+# - E203 - whitespace before ':'
+# - E501 - line too long
+# - W503 - line break before binary operator
+max-line-length=88
+ignore = E203, E501, W503
diff --git a/.gitignore b/.gitignore
@@ -15,6 +15,7 @@ node_modules/
 .env.test.local
 .env.production.local
 /.env
+__pycache__/
 
 npm-debug.log*
 yarn-debug.log*
@@ -24,6 +25,7 @@ yarn-error.log*
 /testing/node_modules/
 /client/build/
 /ssr/dist/
+/aws-lambda/**/*.zip
 
 # Can be removed once content lives in its own repo
 /archivecontent/files
@@ -68,3 +70,5 @@ yarn-error.log*
 /deployer/src/deployer.egg-info/
 fake-v1-api/
 mdn-yari-*.tgz
+lambda/**/*.zip
+function.zip
diff --git a/content/utils.js b/content/utils.js
@@ -8,6 +8,12 @@ function buildURL(locale, slug) {
   return `/${locale}/docs/${slug}`.toLowerCase();
 }
 
+/*
+ * NOTE: A nearly identical copy of this function is used within
+ *       ./lambda/content-origin-request/index.js. If you make a
+ *       change to this function, you must replicate the change
+ *       there as well.
+ */
 function slugToFolder(slug) {
   return (
     slug

diff --git a/deployer/README.md b/deployer/README.md
@@ -1,123 +1,131 @@
-# deployer
+# Deployer
 
-Ship a Yari static site for web hosting.
+The Yari deployer does two things. First, it's used to upload document
+redirects, pre-built document pages, static files (e.g. JS, CSS, and
+image files), and sitemap files into an existing AWS S3 bucket.
+Since we serve MDN from a S3 bucket via a CloudFront CDN, this is the
+way we upload a new version of the site.
 
-## Limitations and caveats
+Second, it is used to update and publish changes to existing AWS Lambda
+functions. For example, we use it to update and publish new versions of
+a Lambda function that we use to transform incoming document URL's into
+their corresponding S3 keys.
 
-- Redirects - in the build directory we're supposed to have
-  `/en-us/_redirects.txt`
-
-- Without Lambda@Edge in front of the S3 Website some URLs won't map correctly.
-
-- GitHub integration
-
-## How it works
+## Getting started
 
-This project's goal is ultimately to take a big directory of files and
-upload them to S3. But there are some more advanced features so as
-turning `_redirects.txt` files into S3 redirect keys. And there might be
-file system names that don't match exactly what we need the S3 key to
-be called exactly.
+You can install it globally or in a virtualenv environment. Whichever you
+prefer.
 
-All deployments, generally, go into the one same S3 bucket. But in that bucket
-you always have a "prefix" (aka. a root folder) which gets used by
-CloudFront so you can have _N_ CloudFront distributions for 1 S3 bucket.
-For example, one prefix might be called `master` which'll be the
-production site. Another prefix might be `peterbe-pr12345`.
+```sh
+cd deployer
+poetry install
+poetry run deployer --help
+```
 
-It might be worth considering having 2 buckets:
+Please refer to the [`boto3` documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration)
+with regards to configuring AWS access credentials.
 
-- One for production builds
+## Uploads
 
-- One for pull request builds
+The `poetry run deployer upload` command uploads files as well as redirects
+into an existing S3 bucket. As input, it takes a directory which contains the
+files that should be uploaded, but it also needs to know where to find
+any redirects that should be uploaded. By default it searches for redirects
+within the content directories specified by `--content-root` (or
+`CONTENT_ROOT`) and `--content-translated-root` (or `CONTENT_TRANSLATED_ROOT`).
+It does this by searching for `_redirects.txt` files within those directories,
+converting each line in a `_redirects.txt` file into an AWS S3 redirect key.
+The files and redirects can be uploaded into the S3 bucket's root, or instead
+into a sub-folder of the root (`--folder` option), which is what we do when
+uploading experimental versions of the site.
 
-So every deployment has a prefix (aka. the "name") which can be automatically
-generated based on the name of the current branch, but if it's known
-from the CI environment, even better, then we don't need to ask git.
-The first thing it does is that it downloads a complete listing of
-every known key in the bucket under that prefix and each key's size.
-(That's all you get from `bucket.list_objects_v2`). Now, it starts to
-walk the local directory and for each _file_ it applies the following logic:
+Currently, we have three main S3 buckets that we upload into: `dev` (for
+experimental or possible versions of the site), `stage`, and `prod`.
 
-- Does it S3 key _not_ exist at all? --> Upload brand new S3 key!
-- Does the S3 key _exist_?
-  - Is the file size different from the S3 key size? --> Upload changed S3 key!
-  - Is the file size exactly the same as the S3 key size? --> Download the
-    S3 key's `Metadata->filehash`.
-    - Is the hash exactly the same as the file's hash? --> Do nothing!
-    - Is the hash different? --> Upload changed S3 key!
+When uploading files (not redirects), the deployer is intelligent about
+what it uploads. If only uploads files whose content has changed, skipping
+the rest. However, since the `cache-control` attribute of a file is not
+considered part of its content, if you'd like to change the `cache-control`
+from what's in S3, it's important to use the `--force-refresh` option to
+ensure that all files are uploaded with fresh `cache-control` attributes.
 
-When it uploads an S3 key, _always_ compute the local file's hash and
-include that as a piece of S3 key Metadata.
+Redirects are always uploaded.
 
-## Getting started
+### Examples
 
-You can install it globally or in a virtualen environment. Whatever floats
-float fancy.
+```sh
+cd deployer
+poetry run deployer upload --bucket dev --folder pr1234 ../client/build
+```
 
-    poetry install
-    poetry run deployer --help
+```sh
+cd deployer
+poetry run deployer upload --bucket prod ../client/build
+```
 
-Please refer to the [`boto3` documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration) with regards to configuring AWS access credentials.
+## Updating Lambda Functions
 
-### Actually uploading something
+The command:
 
-The sub-command for uploading is called `upload`. You use it like this:
+```sh
+cd deployer
+poetry run deployer update-lambda-functions
+```
 
-    poetry run deployer upload --help
+will discover every folder that contains a Lambda function, create a
+deployment package (Zip file) for each one by running:
 
-An example of this is, if you know what you want your bucket to be called
-and you know what the folder prefix should be, and you have built the whole
-site in `../client/build`:
+```sh
+yarn make-package
+```
 
-    poetry run deployer upload --bucket mdn-yari --name pr1234 ../client/build
+and if the deployment package is different from what is already in AWS,
+it will upload and publish a new version.
 
 ## Environment variables
 
-All the options you can specify with the CLI can equally be expressed
-as environment variables. You just need to prefix it with `DEPLOYER_` and
-write it all upper case.
-
-    export DEPLOYER_BUCKET=peterbe-yari
-    export DEPLOYER_NAME=master
-    poetry run deployer upload ../client/build
-
-...is the same as...
-
-    poetry run deployer upload --bucket peterbe-yari --name master ../client/build
-
-Other things you can set (excluding AWS credentials for `boto3`):
-
-- `AWS_PROFILE` - default: `default`
-- `S3_BUCKET_LOCATION` - default: `''`
-- `DEPLOYER_MAX_WORKERS_PARALLEL_UPLOADS` - default: `50`
-- `DEPLOYER_CACHE_CONTROL` - default: `60 * 60`
-- `DEPLOYER_HASHED_CACHE_CONTROL` - default: `60 * 60 * 24 * 365`
-- `DEPLOYER_NO_PROGRESSBAR` - default: `false`
-
-## Goal
-
-To be dead-easy to use and powerful at the same time.
+The following environment variables are supported.
+
+- `DEPLOYER_BUCKET_NAME` is equivalent to using `--bucket`
+- `DEPLOYER_NO_PROGRESSBAR` is equivalent to using `--no-progressbar`
+- `DEPLOYER_CACHE_CONTROL` can be used to specify the `cache-control`
+  header for all non-hashed files that are uploaded (the default is
+  `3600` or one hour)
+- `DEPLOYER_HASHED_CACHE_CONTROL` can be used to specify the `cache-control`
+  header for all hashed files (e.g., `main.3c12da89.chunk.js`) that are
+  uploaded (the default is `31536000` or one year)
+- `DEPLOYER_MAX_WORKERS_PARALLEL_UPLOADS` controls the number of worker
+  threads used when uploading (the default is `50`)
+- `CONTENT_ROOT` is equivalent to using `--content-root`
+- `CONTENT_TRANSLATED_ROOT` is equivalent to using `--content-translated-root`
 
 ## Contributing
 
 You need to have [`poetry` installed on your system](https://python-poetry.org/docs/).
 Now run:
 
-    cd deployer
-    poetry install
+```sh
+cd deployer
+poetry install
+```
 
 That should have installed the CLI:
 
-    poetry run deployer
+```sh
+poetry run deployer
+```
 
 If you wanna make a PR, make sure it's formatted with `black` and
 passes `flake8`.
 
 You can check that all files are `flake8` fine by running:
 
-    flake8 deployer
+```sh
+flake8 deployer
+```
 
 And to check that all files are formatted according to `black` run:
 
-    black --check deployer
+```sh
+black --check deployer
+```
diff --git a/deployer/aws-lambda/README.md b/deployer/aws-lambda/README.md
@@ -0,0 +1,54 @@
+# Yari Lambda Functions
+
+This `aws-lambda` folder contains one or more sub-folders each of which may
+define a unique AWS Lambda function used by Yari. A sub-folder defines an AWS
+Lambda function if it contains the following:
+
+- an `index.js` file containing the code of the Lambda function
+- a `package.json` file which defines the Lambda function's dependencies as
+  well as a `make-package` command that when run creates an AWS Lambda
+  deployment package (Zip file) containing the function's code (`index.js`)
+  and dependencies (`node_modules/*`).
+
+  Here's an example `package.json` file:
+
+  ```json
+  {
+    "description": "Defines the deployment package for this AWS Lambda function.",
+    "private": true,
+    "main": "index.js",
+    "license": "MPL-2.0",
+    "scripts": {
+      "make-package": "yarn install && zip -r -X function.zip . -i index.js 'node_modules/*'"
+    },
+    "dependencies": {
+      "sanitize-filename": "^1.6.3"
+    },
+    "engines": {
+      "node": "12.x"
+    },
+    "aws": {
+      "name": "mdn-content-origin-request",
+      "region": "us-east-1"
+    }
+  }
+  ```
+
+## Updating Lambda Functions in AWS
+
+The command:
+
+```sh
+cd deployer
+poetry run deployer update-lambda-functions
+```
+
+will discover every folder that contains a Lambda function, create a
+deployment package (Zip file) for each one by running:
+
+```sh
+yarn make-package
+```
+
+and if the deployment package is different from what is already in AWS,
+it will upload and publish a new version.
diff --git a/deployer/aws-lambda/content-origin-request/index.js b/deployer/aws-lambda/content-origin-request/index.js
@@ -0,0 +1,52 @@
+const sanitizeFilename = require("sanitize-filename");
+
+const CONTENT_DEVELOPMENT_DOMAIN = ".content.dev.mdn.mozit.cloud";
+
+/*
+ * NOTE: This function is derived from the function of the same name within
+ *       ../../content/utils.js. It differs only in its final "join", which
+ *       uses "/", as required by S3 keys, rather than "path.sep".
+ */
+function slugToFolder(slug) {
+  return slug
+    .replace(/\*/g, "_star_")
+    .replace(/::/g, "_doublecolon_")
+    .replace(/:/g, "_colon_")
+    .replace(/\?/g, "_question_")
+    .toLowerCase()
+    .split("/")
+    .map(sanitizeFilename)
+    .join("/");
+}
+
+exports.handler = async (event, context) => {
+  /*
+   * Modify the request before it's passed to the S3 origin.
+   */
+  const request = event.Records[0].cf.request;
+  const host = request.headers.host[0].value.toLowerCase();
+  // Rewrite the URI to match the keys in S3.
+  // NOTE: The incoming URI should remain URI-encoded.
+  let newURI = slugToFolder(request.uri);
+  if (newURI.includes("/docs/") && !newURI.endsWith("/index.json")) {
+    if (!newURI.endsWith("/")) {
+      newURI += "/";
+    }
+    newURI += "index.html";
+  }
+  request.uri = newURI;
+  // Rewrite the HOST header to match the S3 bucket website domain.
+  // This is required only because we're using S3 as a website, which
+  // we need in order to do redirects from S3. NOTE: The origin is
+  // considered a "custom" origin because we're using S3 as a website.
+  request.headers.host[0].value = request.origin.custom.domainName;
+  // Conditionally rewrite the path (prefix) of the origin.
+  if (host.endsWith(CONTENT_DEVELOPMENT_DOMAIN)) {
+    // When reviewing PR's, each PR gets its own subdomain, and
+    // all of its content is prefixed with that subdomain in S3.
+    request.origin.custom.path = `/${host.split(".")[0]}`;
+  } else {
+    request.origin.custom.path = "/main";
+  }
+  return request;
+};
diff --git a/deployer/aws-lambda/content-origin-request/package.json b/deployer/aws-lambda/content-origin-request/package.json
@@ -0,0 +1,19 @@
+{
+  "description": "Defines the deployment package for this AWS Lambda function.",
+  "private": true,
+  "main": "index.js",
+  "license": "MPL-2.0",
+  "scripts": {
+    "make-package": "yarn install && zip -r -X function.zip . -i index.js 'node_modules/*'"
+  },
+  "dependencies": {
+    "sanitize-filename": "^1.6.3"
+  },
+  "engines": {
+    "node": ">=12.x"
+  },
+  "aws": {
+    "name": "mdn-content-origin-request",
+    "region": "us-east-1"
+  }
+}