Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mitigate 429 errors in cloud function execution for validation reports #883

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

davidgamez
Copy link
Member

@davidgamez davidgamez commented Jan 15, 2025

Summary:
This PR adds a cloud task to the infrastructure to handle tasks that can be executed within two hours and has a rate of 8 concurrent requests. The call from the gtfs_validator_execution workflow to the process-validation-report function is wrapped with the added cloud task.

Closes #863 #875

Out of scope: set limits in all cloud functions as described here. This will be done in a follow-up issue.

From our AI friend

This pull request includes several changes to the infrastructure and workflow configurations to improve the handling of cloud tasks and permissions. The most important changes include adding new resources for cloud task queues, updating IAM permissions, and modifying the workflow for task enqueuing and handling.

Infrastructure Changes:

  • Added a local variable x_number_of_concurrent_instance to define the number of concurrent instances. (infra/functions-python/main.tf, infra/functions-python/main.tfR26-R27)
  • Created a dead letter queue (google_cloud_tasks_queue resource) to handle failed cloud tasks with specific rate limits and retry configurations. (infra/functions-python/main.tf, infra/functions-python/main.tfR351-R390)
  • Created a 2X rate queue (google_cloud_tasks_queue resource) with defined rate limits and retry configurations for cloud tasks. (infra/functions-python/main.tf, infra/functions-python/main.tfR351-R390)

IAM Permissions:

  • Added IAM permissions for the workflow service account to act as a service account user, enqueuer, and viewer for cloud tasks. (infra/workflows/main.tf, infra/workflows/main.tfR55-R80)

Workflow Modifications:

  • Updated the gtfs_validator_execution.yml workflow to extract the environment from the project ID and use it to define the cloud task queue name. (workflows/gtfs_validator_execution.yml, workflows/gtfs_validator_execution.ymlR22-R24)
  • Replaced the direct database update call with a task enqueueing process, including steps to create a payload, enqueue a task, and handle task completion with retries and logging. (workflows/gtfs_validator_execution.yml, workflows/gtfs_validator_execution.ymlL224-R297)
    Expected behavior:

Explain and/or show screenshots for how you expect the pull request to work in your testing (in case other devices exhibit different behavior).

Testing tips:

Test individual workflow

  • From GCP console, go to workflows home
  • Execute gtfs_validator_execution workflow. You can use the latest input as parameter.

Stress test

This is how the #863 issue was discovered

  • From GCP console, go to cloud functions home
  • Execute the `update_validation_report_ function
  • Go to workflows and expect to have under the gtfs_validator_execution workflow an execution per feed. All execution should pass without issues.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@davidgamez davidgamez marked this pull request as ready for review January 16, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mitigate 429 errors in cloud function execution for validation reports
1 participant