-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-48194: Deploy a dev prompt processing service for LSSTCam-ImSim #4168
base: main
Are you sure you want to change the base?
Conversation
11c1266
to
51176d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, though I have questions about some of the settings.
@@ -88,7 +89,7 @@ detectorConfig: | |||
8: True | |||
LSSTComCamSim: | |||
<<: *ComCam | |||
LSSTCam: | |||
LSSTCam: &LSSTCam | |||
0: False | |||
1: False | |||
2: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these all need to be set to true
? I'm not sure why they were false in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they should be true
. Thank you for catching that!
(Sorry this is not fully tested yet and hence the problem wasn't caught yet)
topic: prompt-processing-dev | ||
|
||
apdb: | ||
config: s3://rubin-pp-dev-users/apdb_config/cassandra/pp_apdb_lsstcamimsim-dev.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be .py
or .yaml
? The Confluence page says .yaml
(which it looks like we support as of w.2025.04
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be .yaml
. Thank you for catching it.
# Expect to need roughly n_detector × request_latency / survey_cadence pods | ||
# But we do not have the compute yet. This will be adjusted. | ||
autoscaling.knative.dev/max-scale: "200" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have values for everything in the formula? Certainly 200 is much too low.
# @default -- None, must be set | ||
preprocessing: "" | ||
# -- Skymap to use with the instrument | ||
skymap: "lsst_cells_v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is the DC2 skymap? Is patchesPerImage = 16
(which I assume was copied from ComCamSim) still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original plan is to really use lsst_cells_v1
.
Then we found out that the DC2's DC2_cells_v1
and lsst_cells_v1
are identical, just different name.
Only lsst_cells_v1
exists in repo embargo_or5
today so I'd keep it for now. Later we might change it depending on which name is chosen for actual OR5.
# -- Maximum time that a container can send nothing to Knative (seconds). | ||
# This is only useful if the container runs async workers. | ||
# If 0, idle timeout is ignored. | ||
idleTimeout: 900 | ||
# -- Maximum time that a container can send nothing to Knative after initial submission (seconds). | ||
# This is only useful if the container runs async workers. | ||
# If 0, idle timeout is ignored. | ||
responseStartTimeout: 900 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend setting these to 0, since otherwise Knative may queue up extra jobs before the Gunicorn worker times out. (Until we greatly reduce the Gunicorn timeout, the actual Knative timeout is at its cap of 1200 seconds.)
@@ -14,6 +14,6 @@ image: | |||
repository: ghcr.io/lsst-dm/next_visit_fan_out | |||
pullPolicy: IfNotPresent | |||
# Overrides the image tag whose default is the chart appVersion. | |||
tag: 2.5.0 | |||
tag: tickets-DM-48194 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I be reviewing fan out as well? I see two branches but no PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, fan-out isn't ready yet.
2e1a091
to
bafdac3
Compare
The service is started with mostly configs from ComCam, but will be tuned later.
85ee3e6
to
227f486
Compare
227f486
to
26bf2ed
Compare
RuntimeWarning: Cannot store all inputs in cache; dropping {DatasetRef(DatasetType('cal_ref_cat_2_2', {htm7}, SimpleCatalog), {htm7: 147130}, run='refcats/PREOPS-301', id=2cf75345-fefa-4e70-b35e-54fd57d128b6)}.
No description provided.