Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-48194: Deploy a dev prompt processing service for LSSTCam-ImSim #4168

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hsinfang
Copy link
Contributor

@hsinfang hsinfang commented Feb 5, 2025

No description provided.

@hsinfang hsinfang force-pushed the tickets/DM-48194 branch 5 times, most recently from 11c1266 to 51176d7 Compare February 6, 2025 19:07
@hsinfang hsinfang requested a review from kfindeisen February 6, 2025 21:02
Copy link
Collaborator

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though I have questions about some of the settings.

@@ -88,7 +89,7 @@ detectorConfig:
8: True
LSSTComCamSim:
<<: *ComCam
LSSTCam:
LSSTCam: &LSSTCam
0: False
1: False
2: False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these all need to be set to true? I'm not sure why they were false in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they should be true. Thank you for catching that!

(Sorry this is not fully tested yet and hence the problem wasn't caught yet)

topic: prompt-processing-dev

apdb:
config: s3://rubin-pp-dev-users/apdb_config/cassandra/pp_apdb_lsstcamimsim-dev.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be .py or .yaml? The Confluence page says .yaml (which it looks like we support as of w.2025.04?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be .yaml. Thank you for catching it.

Comment on lines +7 to +9
# Expect to need roughly n_detector × request_latency / survey_cadence pods
# But we do not have the compute yet. This will be adjusted.
autoscaling.knative.dev/max-scale: "200"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have values for everything in the formula? Certainly 200 is much too low.

# @default -- None, must be set
preprocessing: ""
# -- Skymap to use with the instrument
skymap: "lsst_cells_v1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the DC2 skymap? Is patchesPerImage = 16 (which I assume was copied from ComCamSim) still valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original plan is to really use lsst_cells_v1.

Then we found out that the DC2's DC2_cells_v1 and lsst_cells_v1 are identical, just different name.

Only lsst_cells_v1 exists in repo embargo_or5 today so I'd keep it for now. Later we might change it depending on which name is chosen for actual OR5.

applications/prompt-proto-service-lsstcamimsim/values.yaml Outdated Show resolved Hide resolved
Comment on lines +157 to +164
# -- Maximum time that a container can send nothing to Knative (seconds).
# This is only useful if the container runs async workers.
# If 0, idle timeout is ignored.
idleTimeout: 900
# -- Maximum time that a container can send nothing to Knative after initial submission (seconds).
# This is only useful if the container runs async workers.
# If 0, idle timeout is ignored.
responseStartTimeout: 900
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend setting these to 0, since otherwise Knative may queue up extra jobs before the Gunicorn worker times out. (Until we greatly reduce the Gunicorn timeout, the actual Knative timeout is at its cap of 1200 seconds.)

@@ -14,6 +14,6 @@ image:
repository: ghcr.io/lsst-dm/next_visit_fan_out
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: 2.5.0
tag: tickets-DM-48194
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I be reviewing fan out as well? I see two branches but no PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, fan-out isn't ready yet.

@hsinfang hsinfang force-pushed the tickets/DM-48194 branch 3 times, most recently from 2e1a091 to bafdac3 Compare February 7, 2025 18:19
The service is started with mostly configs from ComCam, but will be tuned later.
RuntimeWarning: Cannot store all inputs in cache; dropping {DatasetRef(DatasetType('cal_ref_cat_2_2', {htm7}, SimpleCatalog), {htm7: 147130}, run='refcats/PREOPS-301', id=2cf75345-fefa-4e70-b35e-54fd57d128b6)}.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants