Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional batch job timestamps #556

Open
wants to merge 7 commits into
base: draft
Choose a base branch
from
Open

Add additional batch job timestamps #556

wants to merge 7 commits into from

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Jan 12, 2025

Implements #542

Reasoning for the fields based on the discussion in #542 and the existing fields (created, updated, expires):

  • created: exists, the time when the batch job was created
  • updated: exists, the time of the last status change
  • queued: new, the time when the job was queued, to be able to check how long jobs are queued (with started)
  • started: new, the time when the job started processing (status changed to running), to be able to check how long processing takes (with updated)
  • expires: partially new, the time when the assets get deleted (or if in the past, the time when the assets got deleted) - a field with this description existed in the STAC metadata, added to the job metadata. Would be great if that gets filled if results get deletes so that clients can flag these jobs.
  • ended: NOT added, it would always be the same value as updated according to the definition of the updated field.

All properties are optional except for created.

created, updated and expires are somewhat aligned in their meaning with STAC. queued and started don't exist in STAC.

@m-mohr m-mohr added the job management incl. /result label Jan 12, 2025
@m-mohr m-mohr added this to the 1.3.0 milestone Jan 12, 2025
@m-mohr m-mohr linked an issue Jan 12, 2025 that may be closed by this pull request
@clausmichele
Copy link
Member

An useful addition to track the time a batch job takes without workarounds, thanks! By the way, to add those field to the STAC results we would have to use this extension right? https://github.com/stac-extensions/timestamps/

@m-mohr
Copy link
Member Author

m-mohr commented Jan 13, 2025

@clausmichele Only for expires, all others are not defined in that extension. Due to slightly different meaning, it could also be omitted.

openapi.yaml Outdated Show resolved Hide resolved
openapi.yaml Outdated Show resolved Hide resolved
openapi.yaml Outdated Show resolved Hide resolved
openapi.yaml Outdated Show resolved Hide resolved
openapi.yaml Outdated Show resolved Hide resolved
m-mohr and others added 2 commits January 25, 2025 01:32
Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion

openapi.yaml Outdated
type: string
format: date-time
description: >-
Time until which the assets are accessible, in UTC. Formatted as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Time until which the assets are accessible, in UTC. Formatted as
Time until which the assets (e.g., batch job results) are available for download, in UTC. Formatted as

Copy link
Member Author

@m-mohr m-mohr Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a bit misleading.
expires can be used for when the signed URLs expire and also for when the data is deleted, I think. That's why I chose accessible. The addition in brackets is fine though.

Suggested change
Time until which the assets are accessible, in UTC. Formatted as
Time until which the assets (e.g., batch job results) are accessible, in UTC. Formatted as

Maybe we should actually split into expires and unpublished, i.e.

  • expires: signed URL expiry (if relevant here?) but in STAC it's that...
  • unpublished: when the data gets deleted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok good point.
But indeed, we should separate the expiry of signed URLs (which can be refreshed easily) and the expiry for download availability in general (which is final and non-recoverable).

I think the general availability of results should be a global property on GET /jobs/{job_id} or GET /jobs/{job_id}/results, while the signed URL expiry should be closer to the links/URLs themselves (e.g. different signed URLs within the same resource might have different expiry for some advanced reason)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've updated the PR accordingly:

  • Clarified usage of expires, removed from the batch job info endoint
  • Added unpublished to batch job info endpoint and batch job results
  • Added all the timestamps also to the Collection-typed result response, previously it was only defined in the Item-typed result response

@m-mohr m-mohr requested a review from soxofaan January 28, 2025 17:26
openapi.yaml Outdated Show resolved Hide resolved
openapi.yaml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
job management incl. /result
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: add started, ... to job metadata
4 participants