Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-40513: Add first-draft metadetection PipelineTask. #22

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

TallJimbo
Copy link
Member

No description provided.

catalog for the patch, with schema equal to `object_schema`.
"""
single_cell_tables: list[pa.Table] = []
for single_cell_coadds in zip(*patch_coadds, strict=True):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a loop over bands right? We will be making a combined coadd over bands for detection, but then process the bands simultaneously to get the shear info, fluxes etc.

Copy link
Member Author

@TallJimbo TallJimbo Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a loop over individual cells, in which each single_cell_coadds object is a sequence of per-band single-cell coadds.

I figured we'd coadd across bands inside process_cell, one cell at a time. I was assuming there'd be no reason to coadd all cells across bands up front and then process both the multi-band coadd and the per-band coadds for each cell more later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now there is a star on patch_coadds. So this means that patch_coadds is actuallly a sequence of sequences? Where can I find the definition of MultiCellCoadd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I wrote this thinking that MultipleCellCoadd was iterable, but it isn't (you have to iterate over .cells) so this was actually wrong (just pushed a fix).

Here's the definition of MultipleCellCoadd: https://github.com/lsst/cell_coadds/blob/main/python/lsst/cell_coadds/_multiple_cell_coadd.py

And here's the definition of SingleCellCoadd: https://github.com/lsst/cell_coadds/blob/main/python/lsst/cell_coadds/_single_cell_coadd.py

Those both refer to ImagePlanes and CommonComponents from the same package.

Copy link

@esheldon esheldon Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Is there an existing way to coadd these individual band SingleCellCoadds or should we extract the masked image and do it ourselves?

How do I extract the image(s)?

Copy link

@esheldon esheldon Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming we should not rely on hidden _outer attribute)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, I see there is a property for outer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and extract the outer masked image or individual outer single-plane images and do it yourself; it might make sense to move that into a method on SingleCellCoadd for reuse at some point, but we can look into that once you've defined the behavior you want for the combination.

# TODO: if we need to do any cell-overlap-region deduplication here
# (instead of purely in science analysis code), this is where'd it'd
# happen.
return Struct(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this struct have to match the schema returned by make_object_schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object_catalog attribute of the struct needs to be a pyarrow table that matches that schema. The attributes of the Struct itself correspond to the cT.Output connections in MetadetectionShearConnections, of which there is only one right now.

Output object catalog for the cell, with schema equal to
`object_schema`.
"""
rows: list[dict[str, Any]] = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must it be a list/dict or can it be a numpy array with fields? e.g. np.zeros(nrows, dtype=dtype)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure numpy array with fields can work too, because @erykoff has a function that can convert from that to pyarrow.Table we can import and use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do! What I don't have is a way of connecting that with the arrow schema metadata, but once we have an idea of what functionality we need I can add that.

# detection masks), add them here.

object_catalog = cT.Output(
"ShearObject",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this name matter? If so should this be more specific like "MDetShearObject" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does matter, but for at least DR1 I think "ShearObject" makes sense as there will be only one shear catalog table and that's what we've put in the DPDD. @arunkannawadi may ultimately have been right that we should have put a more specific name in the DPDD, too, but I'm comfortable revisiting that for DR2+ if/when we decide to run another algorithm alongside.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be surprised if Xiangchong Li has a code ready by then

TallJimbo and others added 15 commits October 19, 2023 16:16
MultipleCellCoadd isn't iterable, but its .cells attribute is a
mapping.
The Input name is set correctly to cellCoadd

temporarily commented object schema "cT.InitOutput" because this
doesn't work yet

temporarilyl set required_bands to r, need to learn how to
set with config
Currently the real cell coadds don't have everything we need
The metadetect code expects MultiBandExposure
We currently run with a config restricting to r for
real data, as other bands don't yet exist for the
testbed

Note we will want to process g separately, this is
still TODO
currently the sims cannot generate the bright objects
because the catalog is not available

Also when using real data we need to pass in the info for
bright objects to be masked
This has not been tested.  It will definitely need config overrides
to "config.connections.ref_cat" to work:

- "cal_ref_cat" for DC2 data
- "ps1_pv3_3pi_20170110" or "gaia_dr3_20230707" for real data

Right now the reference catalog is being passed to run() as an
lsst.afw.table.SimpleCatalog, which is probably not what anyone wants,
but it's what we start off with.  Calling .asAstropy() on it and
passing that to run() would be one alternative, and calling
.extract(<columns>) to get a dict of numpy array columns would be
another.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants