-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend scope of alignment="same_verifs"
#699
Comments
#702 will help to visualize the discussion |
Thank you for this extension proposal issue @dougiesquire In #702, I played around with your use case and indeed init = xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="AS")
lead = range(0, 24)
data = np.random.random((len(init), len(lead)))
hind = xr.DataArray(data, coords=[init, lead], dims=["init", "lead"], name="var")
hind["lead"].attrs["units"] = "months"
time = xr.cftime_range(
start="2000-01-01", periods=len(init) * 12 + len(lead), freq="MS"
)
data = np.random.random(len(time))
obs = xr.DataArray(data, coords=dict(time=time), dims="time", name="var")
h = climpred.HindcastEnsemble(hind).add_observations(obs)
h.coords["valid_time"]
h.plot()
h.plot_alignment() Some comments:
What about a new alignment method
Thats what I am still not quite understanding how this new alignment would look like. Would it essentially take 12 out of 24 lead months and slide from earlier leads at late inits to later leads at ealier inits? (12 depends on some other specifics I guess or is that because of the monthly freqs in a year?) @bradyrx thoughts (on a new alignment)? |
so this alignment would be the first where the number of |
Sorry, I think my description is unclear. And I'm not sure I've fully thought through my suggestion. I'm not meaning to suggest that the number of leads should be reduced. I'm proposing an alignment that finds the maximum period that:
All Consider the following examples with four hindcasts each
Does this make sense? |
Thanks @dougiesquire. Now I get your approach. So Note: For your example to work you definitely need a monthly So for your second example,
The number of sample isnt equal but wont differ more than +/- 1 IMO. Taking I'd still prefer to make a new Would you lead a PR? Entrypoint is climpred/climpred/alignment.py Line 125 in f6e05d1
I am happy to give feedback and test. |
Yes exactly - sorry should've made that clearer
Good point. I messed that up, sorry. Now I realise there isn't a single solution to the constraints I've posed. I think there'd be value in an alignment something like what I'm suggesting. But it seems like I still need to work out the best approach for Happy to open a PR where I can flesh this out a little better. But it might take me a little while to get to it sorry. |
alignment="same_verifs"
The "same_verifs" alignment generates a list of times from
verif
that are present inforecast
at any init but all leads. This list will always be empty when the init frequency is lower than the lead frequency. Is there scope to extend "same_verifs" to instead deal appropriately with such cases? I'll try to give a concrete example of what I mean below.Consider the following hindcasts:
I currently can't use "same_verifs" with this data because there are no common times available at all leads.
But, users may still want to align based on a common verification period. I.e., in this example, "valid_time"s [2001-01-01 and 2002-01-01] are available at all possible leads for which they can occur (leads 0 and 12 months). Similarly,
...
That is, by performing verification over the period 2001-01-01 - 2002-12-01 one includes:
How do folks feel about trying to restructure
cftime.utils._same_verifs_alignment()
to use the above alignment dates in the above example? We would obviously do this such that the current behaviour is preserved for datasets that have common verification times across all leads.The text was updated successfully, but these errors were encountered: