Skip to content

Commit

Permalink
fix: Refine memory handling and timeout params for batch processing (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
fg-nava authored Jan 14, 2025
1 parent 7a3293a commit fa532c0
Show file tree
Hide file tree
Showing 7 changed files with 19 additions and 9 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Each app should have:

- `ci-[app_name]`: must be created; should run linting and testing
- `ci-[app_name]-vulnerability-scans`: calls `vulnerability-scans`
- Based on [ci-app-vulnerability-scans](https://github.com/navapbc/template-infra/blob/main/.github/workflows/ci-app-vulnerability-scans.yml)
- Based on [ci-app-vulnerability-scans.yml.jinja](https://github.com/navapbc/template-infra/blob/main/.github/workflows/ci-%7B%7Bapp_name%7D%7D-vulnerability-scans.yml.jinja)

### App-agnostic workflows

Expand All @@ -23,7 +23,7 @@ Each app should have:
Each app should have:

- `cd-[app_name]`: deploys an application
- Based on [`cd-app`](https://github.com/navapbc/template-infra/blob/main/.github/workflows/cd-app.yml)
- Based on [`cd-app.yml.jinja`](https://github.com/navapbc/template-infra/blob/main/.github/workflows/cd-%7B%7Bapp_name%7D%7D.yml.jinja)

The CD workflow uses these reusable workflows:

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/markdownlint-config.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@
"pattern": "^/",
"replacement": "{{BASEURL}}/"
}
]
]
}
5 changes: 5 additions & 0 deletions app/gunicorn.conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,8 @@
bind = app_config.host + ':' + str(app_config.port)
workers = 1
threads = 4

# Increase timeout to handle long-running requests
timeout = 300 # 5 minutes
graceful_timeout = 300 # 5 minutes
keepalive = 5
11 changes: 8 additions & 3 deletions app/src/batch_process.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import asyncio
import csv
import gc
import logging
import tempfile
from typing import Awaitable, Callable, Optional
Expand Down Expand Up @@ -42,7 +43,8 @@ async def batch_process(

processed_data = []

# Process questions with progress updates
# Process in smaller batches to manage memory
BATCH_SIZE = 10
for i, q in enumerate(questions, 1):
logger.info("Processing question %d/%d", i, total_questions)

Expand All @@ -51,9 +53,12 @@ async def batch_process(

processed_data.append(_process_question(q, engine))

# Add small delay to prevent overwhelming the system
if i % 10 == 0: # Every 10 questions
# Clear memory after each batch
if i % BATCH_SIZE == 0:
# Add small delay to prevent overwhelming the system
await asyncio.sleep(0.1)
# Force garbage collection to free memory
gc.collect()

# Update rows with processed data
for row, data in zip(rows, processed_data, strict=True):
Expand Down
2 changes: 1 addition & 1 deletion app/src/chat_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ class CaEddWebEngine(BaseEngine):


class ImagineLaEngine(BaseEngine):
retrieval_k: int = 50
retrieval_k: int = 25
retrieval_k_min_score: float = -1

# Note: currently not used
Expand Down
2 changes: 1 addition & 1 deletion docs/infra/set-up-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Before creating migrations that create tables, first create a migration that inc
ALTER DEFAULT PRIVILEGES GRANT ALL ON TABLES TO app
```

This will cause all future tables created by the `migrator` user to automatically be accessible by the `app` user. See the [Postgres docs on ALTER DEFAULT PRIVILEGES](https://www.postgresql.org/docs/current/sql-alterdefaultprivileges.html) for more info. As an example see the example app's migrations file [migrations.sql](https://github.com/navapbc/template-infra/blob/main/app/migrations.sql).
This will cause all future tables created by the `migrator` user to automatically be accessible by the `app` user. See the [Postgres docs on ALTER DEFAULT PRIVILEGES](https://www.postgresql.org/docs/current/sql-alterdefaultprivileges.html) for more info. As an example see the example app's migrations file [migrations.sql](https://github.com/navapbc/template-infra/blob/main/template-only-app/migrations.sql).

Why is this needed? The reason is that the `migrator` role will be used by the migration task to run database migrations (creating tables, altering tables, etc.), while the `app` role will be used by the web service to access the database. Moreover, in Postgres, new tables won't automatically be accessible by roles other than the creator unless specifically granted, even if those other roles have usage access to the schema that the tables are created in. In other words, if the `migrator` user created a new table `foo` in the `app` schema, the `app` user will not automatically be able to access it by default.

Expand Down
2 changes: 1 addition & 1 deletion infra/app/app-config/dev.tf
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ module "dev_config" {
has_database = true
has_incident_management_service = local.has_incident_management_service
service_cpu = 2048
service_memory = 8192
service_memory = 16384

# Enables ECS Exec access for debugging or jump access.
# See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html
Expand Down

0 comments on commit fa532c0

Please sign in to comment.