Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[performance] add API load testing script #4237

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

freemvmt
Copy link
Contributor

@freemvmt freemvmt commented Jan 31, 2025

Related to this ticket.

Main thrust of this PR is the addition of the test_api.py load testing script (see final commit).

However, to get it working I had to make some other changes, such as adding a mechanism to bypass rate limiting, and exposing a file deletion endpoint. They are probably best considered on a per commit basis. I'm happy to split this up into multiple PRs if that's preferable :)

I'll also do a review and tag this up with some additional context.

Once this is merged we should be able to run the new script against the API on staging, to get an idea of whether it can handle the same load we are confident the data layer can (i.e. Hasura + RDS).

When we do that I'll track results here.

Copy link

github-actions bot commented Jan 31, 2025

Pizza

Deployed 440c499 to https://4237.planx.pizza.

Useful links:

@DafyddLlyr
Copy link
Contributor

Rebase to main - please see https://opensystemslab.slack.com/archives/C04EEBR1E0P/p1738574105777659

@freemvmt freemvmt force-pushed the load-testing-api branch 2 times, most recently from 9821cc5 to b3589ff Compare February 4, 2025 22:00
@freemvmt freemvmt requested a review from a team February 4, 2025 22:01
@freemvmt freemvmt marked this pull request as ready for review February 4, 2025 22:01
@@ -39,6 +39,8 @@ FILE_API_KEY_DONCASTER=👻
FILE_API_KEY_GLOUCESTER=👻
FILE_API_KEY_TEWKESBURY=👻

# Used to circumvent API rate limiting for development purposes (e.g. load testing)
SKIP_RATE_LIMIT_SECRET=👻
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already generated a 40char alphanumeric secret and pushed it up to S3, so it's ready to pull down

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm I can pull this down ✔️

@@ -14,7 +19,29 @@ export default async (): Promise<Authenticator> => {
const customPassport = new passport.Passport();

// instantiate Microsoft OIDC client, and use it to build the related strategy
const microsoftIssuer = await Issuer.discover(MICROSOFT_OPENID_CONFIG_URL);
// we also keep said config as a fixture to enable offline local development
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change was literally just to allow me to run the API for development purposes (i.e. do pnpm dev) while I was on a train with no internet, so it doesn't necessarily need to make it into the codebase, but could be useful for others?

afaik fetching the config for the Microsoft OIDC stuff is the only call the code makes when spinning up, so it was possible to run pnpm dev to completion while offline before we added that SSO option, but now it isn't (without this change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no harm in keeping this around! Fixture may additionally come in handy for future E2E test mocks if we ever want Microsoft SSO coverage?

privateDownloadController,
);

router.delete(
Copy link
Contributor Author

@freemvmt freemvmt Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would definitely like to hear whether anyone thinks there's a risk associated with exposing this function? I've made it accessible only to anyone with platform level admin powers.

the utility of it here is that we can upload as many files to staging S3 as we like during load testing without worrying about incurring additional cost, because we can also delete them immediately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing to platformAdmins seems fine by me !

"Public" files are pretty exclusively flow content (eg vector drawings like roof pitch examples) used throughout services uploaded by editors; all have original copies on OSL's Google Drive, so low risk here! (Whereas "/file/private/..." are uploaded by applicants, so it's ideal to keep deletion of those restricted to session sanitation jobs.)

@@ -23,7 +23,7 @@ function useMinio() {
// Points to Minio
return {
endpoint: `http://minio:${process.env.MINIO_PORT}`,
s3ForcePathStyle: true,
forcePathStyle: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this took me a good while to hunt down! a pesky little breaking change resulting from our move to v3 @aws-sdk/client-s3 in #3989 that was causing 500 errors with the following msg:

ServerError: Failed to upload public file: getaddrinfo ENOTFOUND pizza-user-uploads.minio

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for shaking some of these out - went unnoticed for a good while!

@@ -7,6 +7,19 @@ const apiLimiter = rateLimit({
max: 250,
standardHeaders: true,
legacyHeaders: false,
skip: (req: Request, _res: Response) => {
// add a mechanism for skipping rate limit when load testing (on local or staging only)
if (process.env.APP_ENVIRONMENT == "production") return false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this (or a more convoluted solution like hosting Locust workers on remote servers with different IPs) I can't exceed the rate limit of 250 reqs per IP per 10 minute period, which of course is not sufficient for load testing purposes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing I'm not totally following here is why we need a request header and env var? Setting both and checking their equality is just an extra layer of assurance that we're bypassing intentionally?

return resp.failure("Failed to delete file")
resp.success()

# TODO: add another task which hits the /upload-submission endpoint
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't managed to implement this part, but I think the above task should suffice for our purposes for this round of load testing

@freemvmt freemvmt changed the title WIP: add API load testing script add API load testing script Feb 4, 2025
@freemvmt freemvmt changed the title add API load testing script [performance] add API load testing script Feb 4, 2025
Copy link
Member

@jessicamcinchak jessicamcinchak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great here - one minor comment to address around Pulumi secret-getting, but otherwise think this is good to merge & test on staging 🙂

Happy to have another go with the Python script instructions once merged 👍

@@ -39,6 +39,8 @@ FILE_API_KEY_DONCASTER=👻
FILE_API_KEY_GLOUCESTER=👻
FILE_API_KEY_TEWKESBURY=👻

# Used to circumvent API rate limiting for development purposes (e.g. load testing)
SKIP_RATE_LIMIT_SECRET=👻
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm I can pull this down ✔️

@@ -14,7 +19,29 @@ export default async (): Promise<Authenticator> => {
const customPassport = new passport.Passport();

// instantiate Microsoft OIDC client, and use it to build the related strategy
const microsoftIssuer = await Issuer.discover(MICROSOFT_OPENID_CONFIG_URL);
// we also keep said config as a fixture to enable offline local development
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no harm in keeping this around! Fixture may additionally come in handy for future E2E test mocks if we ever want Microsoft SSO coverage?

@@ -71,15 +72,15 @@ export const publicUploadController: UploadController = async (
}
};

export const downloadFileSchema = z.object({
export const hostedFileSchema = z.object({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 indeed a bit more semantically clear !

privateDownloadController,
);

router.delete(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing to platformAdmins seems fine by me !

"Public" files are pretty exclusively flow content (eg vector drawings like roof pitch examples) used throughout services uploaded by editors; all have original copies on OSL's Google Drive, so low risk here! (Whereas "/file/private/..." are uploaded by applicants, so it's ideal to keep deletion of those restricted to session sanitation jobs.)

@@ -23,7 +23,7 @@ function useMinio() {
// Points to Minio
return {
endpoint: `http://minio:${process.env.MINIO_PORT}`,
s3ForcePathStyle: true,
forcePathStyle: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for shaking some of these out - went unnoticed for a good while!

@@ -7,6 +7,19 @@ const apiLimiter = rateLimit({
max: 250,
standardHeaders: true,
legacyHeaders: false,
skip: (req: Request, _res: Response) => {
// add a mechanism for skipping rate limit when load testing (on local or staging only)
if (process.env.APP_ENVIRONMENT == "production") return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing I'm not totally following here is why we need a request header and env var? Setting both and checking their equality is just an extra layer of assurance that we're bypassing intentionally?

{
name: "SKIP_RATE_LIMIT_SECRET",
value: config.requireSecret("skip-rate-limit-secret"),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail on our production deploy's Pulumi up step because there's no equivalent defined in Pulumi.production.yaml and value: config.requireSecret(...) is expecting a defined value?

Alternatively, it looks like config.getSecret() should do what we want here and handle this as a staging-only secret ! Docs here https://www.pulumi.com/docs/iac/concepts/secrets/#creating-secrets-programmatically


The API load testing script requires some files to work with, which are in `/samples`. Care should be taken to make sure anything added there is in the public domain.

For example, I used [Unsplash](https://unsplash.com/s/photos/tree?license=free) to search for an image with a ['free' license](https://unsplash.com/license), and printed a page from the [WikiHouse](https://www.wikihouse.cc/) site as a PDF.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants