Skip to content

Commit

Permalink
Store task stdout and stderr in blobs with task_id in names
Browse files Browse the repository at this point in the history
We were naming task stdout and stderr blobs without task_ids but
with invoke ids. If a function is running multiple times per
invoke this results in overwriting stdout and stderr blobs
by each task. Also depending on timings Server DB could
store wrong blob sizes (cause blobs could get overwritten
after DB record was saved).

This all gets fixed just by adding task_id into stdout, stderr
blob names. Max S3 object name is 1024 bytes long. Storing two
uuids (invoke id and task id) uses < 100 bytes of this space.
So should be fine.

No other places need to get updated because the DB is keyd using
task_ids, and all Server APIs also include task ids in urls.

Testing:

make build
cargo test
  • Loading branch information
eabatalov committed Dec 18, 2024
1 parent 9de7dc7 commit f2c6c3c
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion server/src/routes/internal_ingest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,11 +140,12 @@ pub async fn ingest_files_from_executor(
IndexifyAPIError::bad_request("task_result is required before diagnostics")
})?;
let file_name = format!(
"{}.{}.{}.{}.{}",
"{}.{}.{}.{}.{}.{}",
task_result.namespace,
task_result.compute_graph,
task_result.compute_fn,
task_result.invocation_id,
task_result.task_id,
name,
);
let res = write_to_disk(state.clone().blob_storage, &mut field, &file_name).await?;
Expand Down

0 comments on commit f2c6c3c

Please sign in to comment.