Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iir_params keys a and b are not populated for some observations #969

Open
mmccrackan opened this issue Sep 27, 2024 · 12 comments
Open

iir_params keys a and b are not populated for some observations #969

mmccrackan opened this issue Sep 27, 2024 · 12 comments
Assignees

Comments

@mmccrackan
Copy link
Contributor

I searched and didn't see any issues for this error specifically, but for some observations, the iir_params dictionary populated by _load_book_detset in load_book.py contains None for the IIR filter parameters a and b such that the fourier_filter step fails with:

"/global/homes/m/mmccrack/so_home/repos/20240819_catchup/sotodlib/sotodlib/tod_ops/filters.py", line 522, in iir_filter B, A = np.polyval(b[::-1], z), np.polyval(a[::-1], z)
TypeError: 'NoneType' object is not subscriptable

The other parameters in the iir_params dictionary (enabled and fscale) are populated. There are checks in the iir_filter function in filters.py to see a and b were function inputs and it defaults to those in the tod axis manager if they are not, though there are no checks to see if they are None in the axis manager dictionary itself.

An example observation where this occurs is:

obsid = 'obs_1717755016_satp3_1111111'
dets={'wafer_slot':'ws4', 'wafer.bandpass':'f150'}

@chervias
Copy link
Contributor

Yes, we have known about these for a while, they crashed the mapmaker many many times and we had to put specific try catch statements to avoid that. I'm not sure why this happens.

@msilvafe
Copy link
Contributor

msilvafe commented Oct 1, 2024

This issue goes all the way down our software stack to the streamer (simonsobs/smurf-streamer#40) so I will assign @jlashner

@mhasself
Copy link
Member

This is a data integrity issue, and might indicate deeper problems; thanks for creating an issue.

Questions I'd like to have answers too ASAP:

  • How often does this happen?
  • Do we detect any cases where the readout filter indeed seems to be misconfigured (not just reporting incorrectly)?

Whether or not this gets fixed upstream, we will probably want a mitigation for existing data (unless it's extremely rare). I don't really like the approach of "oh just put in default filter params if they're not there"... that defeats all the care we've taken to record the state of the system. I would rather do this with an external (obsdb/obs_info) tag that confirms, for certain obs, that it's safe to jam in filter parameters.

@jlashner
Copy link
Contributor

jlashner commented Oct 28, 2024

I don't have a full quantitative response, but I did write a script to scan satp1 data to check for this problem, and though I didn't check the full archive, I couldn't find any examples of it.

Is it true this only happens for satp3?

My current theory is that this happens whenever Satp3 uses jackhammer hammer --soft instead of doing a full hammer.
We have it configured so that we should receive a message containing the filter params whenever they are updated or queried (configured here).

When performing a full hammer, the setup function will set defaults for all hardware registers. I'm wondering if this is this is the only time we receive metadata about the filter parameters, in which case restarting the streamer without re-running the full setup would result in some missing metadata. Before, I was under the impression that rogue dumped all monitored parameters to us on initialization, but maybe it is just happening when writing these defaults.

There are currently other problems with soft-hammering and setup that we are trying to understand, so we've already recommended that satp3 stop using that function. However to address this more thoroughly I think we can either make sure to set all register defaults even when soft-hammering, or make sure we are polling these registers regularly, like every 30 seconds or so.

@mmccrackan
Copy link
Contributor Author

Just some statistics on how common the issue is. From the satp3 site-pipeline run I did, it occurred on 139 separate observations out of a total of 1530 observations. Including different bands and wafers, it happened 508 times.

There were none in the satp1 run.

@jlashner
Copy link
Contributor

I ended up running some tests on SATp1 and determined that soft-hammering the smurf is indeed the problem. I put some notes about what I tested on my confluence page here.

Unfortunately, I think we cannot trust the filter parameters for these observations to match those from other observations. After a soft-hammer, we don't run the function that sets all rogue vars based on a configuration file, so the filter param variables will be uninitialized, or initialized to something incorrect (I think the base defaults are set here).

Moving forward I think we should just remove the option to softhammer from jackhammer.

@jlashner
Copy link
Contributor

Actually, I think those parameters in _SmurfProcessor.py file match what we normally run with, so might be ok to assume they're correct.

@msilvafe
Copy link
Contributor

This is a data integrity issue, and might indicate deeper problems; thanks for creating an issue.

Questions I'd like to have answers too ASAP:

  • How often does this happen?
  • Do we detect any cases where the readout filter indeed seems to be misconfigured (not just reporting incorrectly)?

Whether or not this gets fixed upstream, we will probably want a mitigation for existing data (unless it's extremely rare). I don't really like the approach of "oh just put in default filter params if they're not there"... that defeats all the care we've taken to record the state of the system. I would rather do this with an external (obsdb/obs_info) tag that confirms, for certain obs, that it's safe to jam in filter parameters.

@mhasself can you elaborate on what you think consistutes safe and what info you want added to the obsdb (or a new external obsdb)? I think our only option when they're missing is to just assume they're the same as the data that doesn't have them missing.

@mhasself
Copy link
Member

@mhasself can you elaborate on what you think consistutes safe and what info you want added to the obsdb (or a new external obsdb)? I think our only option when they're missing is to just assume they're the same as the data that doesn't have them missing.

It would be an external obsdb. "Safe" means you're confident that you know how the filter is configured. So if @jlashner convinces us that in this case the Smurfs are running with some default parameters, and we know what those default parameters are, and we have looked at a few noise spectra to confirm the filter seems to be doing that ... then it's pretty safe. So we mark all events we know about as "assume_iir_params=1", and have code that turns that into some iir_params in the aman.

Hopefully this doesn't happen any more, but if it does we will want to know about it, and not just quietly patch everything over. Because the failure cause, in the future, could be different from what's happening here, and it might not be true that we can assume_iir_params=1.

mmccrackan added a commit that referenced this issue Oct 31, 2024
* updates for 20240819 satp3 catchup job

* remove changes added after catchup job finished

* add update to site_pipeline

* remove profiling calls

* adjust fill_glitches to address #740

* add error message for empty iir_params to partially address #969

* adjustment to new t2p leakage model handling

* remove repeated calculation get_gap_fill

* add lmfit to conf.py

* add lmfit to setup.py

* updates from satp1 site pipeline job

* remove lingering profile call

* fix processes.py

* adjustments to t2p, flags, gapfill

* fix docstring

* update glitch fill test

* update pca for compatibility

---------

Co-authored-by: Michael McCrackan <[email protected]>
@msilvafe
Copy link
Contributor

@jlashner are you working on adding in the assume_iir_params arguments, hand setting it to True for the obs we know are using defaults so far, and writing the function to populate the iir_params? I guess these come from the lowest level smurf data headers so they enter in the data at book binding if we want to patch around the already bound books I guess it probably makes sense to put this check in the metadata/context loader but I'm not 100% sure how to create/structure/include this extra obsdb that only has the assume_iir_params info in it.

@mhasself
Copy link
Member

mhasself commented Nov 21, 2024

To summarize a recent conversation with @msilvafe :

  • In the short term it would be great if preprocess could just cut all the detectors, for wafers where iir_params is not present. Then to downstream users, this would just seem like "ok whatever, no dets, fine, will ignore". (Could one make a preprocess operation that jams in the iir_params, e.g. with a parameter that says "as long as the platform is satp3 and the timestamp is < something"?)
  • In the longer term we'll have a function called "apply_data_mitigations", and it will inspect obsdb info, just after load, and apply certain mitigations, special for our weird data, that it is allowed to apply. Currently the only evil we need to patch is these missing iir_params, but others will emerge in time. A tag-along obsdb will provide any info needed to apply those mitigations (e.g. in this case it will probably just say "yes patch the iir_params with the defaults". This would benefit from the following future technology: tag-along obsdbs (see Neaby source db #850 and friends) and automatic model application called through context (like apply_pointing_model etc).

@msilvafe
Copy link
Contributor

(Could one make a preprocess operation that jams in the iir_params, e.g. with a parameter that says "as long as the platform is satp3 and the timestamp is < something"?)

Yes, this should be easy enough. I made an issue for this short-term solution #1036 and will track it.

susannaaz pushed a commit to susannaaz/sotodlib that referenced this issue Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants