-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40513: Add first-draft metadetection PipelineTask. #22
base: main
Are you sure you want to change the base?
Conversation
catalog for the patch, with schema equal to `object_schema`. | ||
""" | ||
single_cell_tables: list[pa.Table] = [] | ||
for single_cell_coadds in zip(*patch_coadds, strict=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a loop over bands right? We will be making a combined coadd over bands for detection, but then process the bands simultaneously to get the shear info, fluxes etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a loop over individual cells, in which each single_cell_coadds
object is a sequence of per-band single-cell coadds.
I figured we'd coadd across bands inside process_cell
, one cell at a time. I was assuming there'd be no reason to coadd all cells across bands up front and then process both the multi-band coadd and the per-band coadds for each cell more later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now there is a star on patch_coadds. So this means that patch_coadds is actuallly a sequence of sequences? Where can I find the definition of MultiCellCoadd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I wrote this thinking that MultipleCellCoadd
was iterable, but it isn't (you have to iterate over .cells
) so this was actually wrong (just pushed a fix).
Here's the definition of MultipleCellCoadd
: https://github.com/lsst/cell_coadds/blob/main/python/lsst/cell_coadds/_multiple_cell_coadd.py
And here's the definition of SingleCellCoadd
: https://github.com/lsst/cell_coadds/blob/main/python/lsst/cell_coadds/_single_cell_coadd.py
Those both refer to ImagePlanes
and CommonComponents
from the same package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Is there an existing way to coadd these individual band SingleCellCoadds or should we extract the masked image and do it ourselves?
How do I extract the image(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(assuming we should not rely on hidden _outer
attribute)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, I see there is a property for outer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead and extract the outer masked image or individual outer single-plane images and do it yourself; it might make sense to move that into a method on SingleCellCoadd
for reuse at some point, but we can look into that once you've defined the behavior you want for the combination.
# TODO: if we need to do any cell-overlap-region deduplication here | ||
# (instead of purely in science analysis code), this is where'd it'd | ||
# happen. | ||
return Struct( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this struct have to match the schema returned by make_object_schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object_catalog
attribute of the struct needs to be a pyarrow table that matches that schema. The attributes of the Struct
itself correspond to the cT.Output
connections in MetadetectionShearConnections
, of which there is only one right now.
Output object catalog for the cell, with schema equal to | ||
`object_schema`. | ||
""" | ||
rows: list[dict[str, Any]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must it be a list/dict or can it be a numpy array with fields? e.g. np.zeros(nrows, dtype=dtype)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure numpy array with fields can work too, because @erykoff has a function that can convert from that to pyarrow.Table
we can import and use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do! What I don't have is a way of connecting that with the arrow schema metadata, but once we have an idea of what functionality we need I can add that.
# detection masks), add them here. | ||
|
||
object_catalog = cT.Output( | ||
"ShearObject", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this name matter? If so should this be more specific like "MDetShearObject" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does matter, but for at least DR1 I think "ShearObject" makes sense as there will be only one shear catalog table and that's what we've put in the DPDD. @arunkannawadi may ultimately have been right that we should have put a more specific name in the DPDD, too, but I'm comfortable revisiting that for DR2+ if/when we decide to run another algorithm alongside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't be surprised if Xiangchong Li has a code ready by then
MultipleCellCoadd isn't iterable, but its .cells attribute is a mapping.
The Input name is set correctly to cellCoadd temporarily commented object schema "cT.InitOutput" because this doesn't work yet temporarilyl set required_bands to r, need to learn how to set with config
Currently the real cell coadds don't have everything we need
The metadetect code expects MultiBandExposure
We currently run with a config restricting to r for real data, as other bands don't yet exist for the testbed Note we will want to process g separately, this is still TODO
currently the sims cannot generate the bright objects because the catalog is not available Also when using real data we need to pass in the info for bright objects to be masked
This has not been tested. It will definitely need config overrides to "config.connections.ref_cat" to work: - "cal_ref_cat" for DC2 data - "ps1_pv3_3pi_20170110" or "gaia_dr3_20230707" for real data Right now the reference catalog is being passed to run() as an lsst.afw.table.SimpleCatalog, which is probably not what anyone wants, but it's what we start off with. Calling .asAstropy() on it and passing that to run() would be one alternative, and calling .extract(<columns>) to get a dict of numpy array columns would be another.
a25beb1
to
bbe1965
Compare
No description provided.