Store task stdout and stderr in blobs with task_id in names

We were naming task stdout and stderr blobs without task_ids but with invoke ids. If a function is running multiple times per invoke this results in overwriting stdout and stderr blobs by each task. Also depending on timings Server DB could store wrong blob sizes (cause blobs could get overwritten after DB record was saved). This all gets fixed just by adding task_id into stdout, stderr blob names. Max S3 object name is 1024 bytes long. Storing two uuids (invoke id and task id) uses < 100 bytes of this space. So should be fine. No other places need to get updated because the DB is keyd using task_ids, and all Server APIs also include task ids in urls. Testing: make build cargo test
tensorlakeai · Dec 18, 2024 · f2c6c3c · f2c6c3c
1 parent 9de7dc7
commit f2c6c3c
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/server/src/routes/internal_ingest.rs b/server/src/routes/internal_ingest.rs
@@ -140,11 +140,12 @@ pub async fn ingest_files_from_executor(
                     IndexifyAPIError::bad_request("task_result is required before diagnostics")
                 })?;
                 let file_name = format!(
-                    "{}.{}.{}.{}.{}",
+                    "{}.{}.{}.{}.{}.{}",
                     task_result.namespace,
                     task_result.compute_graph,
                     task_result.compute_fn,
                     task_result.invocation_id,
+                    task_result.task_id,
                     name,
                 );
                 let res = write_to_disk(state.clone().blob_storage, &mut field, &file_name).await?;