-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warehouse State-Machine Does Not Accommodate CMIP6 Dataset Versioning #89
Comments
Propose to convert ALL future E3SM publications to CMIP6-style "vGenDate" versioning, and dispense with all auto-increment-style versioning. This will eliminate the need for codes that switch on "project" and often lead to mismatched versioning issues. This should not cause problems for any related processing. The ESGF publication and tables strip off the "v" in any case, and treat the remaining digits as an integer, for both E3SM and CMIP6 processing. |
Can you provide an example of a "vGenDate" and where all does that string appear? |
Here is an image from the ESGF CMIP6 search service https://esgf-node.llnl.gov/search/CMIP6/
[Graphical user interface, text, application Description automatically generated]
(Guess I can't put images here...)
Also, you can access information on datasets directly through the ESGF REST API. There are tables for datasets and for files. The terminology for naming “datasets” goes something like:
(E3SM Example)
'title': 'E3SM.1_1_ECA.ssp585-BCRC.1deg_atm_60-30km_ocean.land.native.model-output.mon.ens1'
'master_id': 'E3SM.1_1_ECA.ssp585-BCRC.1deg_atm_60-30km_ocean.land.native.model-output.mon.ens1'
'instance_id': 'E3SM.1_1_ECA.ssp585-BCRC.1deg_atm_60-30km_ocean.land.native.model-output.mon.ens1.v2'
'id': 'E3SM.1_1_ECA.ssp585-BCRC.1deg_atm_60-30km_ocean.land.native.model-output.mon.ens1.v2|esgf-data2.llnl.gov'
(CMIP6 Example):
'title': 'CMIP6.CMIP.E3SM-Project.E3SM-1-0.historical.r1i1p1f1.day.rlut.gr'
'master_id': 'CMIP6.CMIP.E3SM-Project.E3SM-1-0.historical.r1i1p1f1.day.rlut.gr '
'instance_id': 'CMIP6.CMIP.E3SM-Project.E3SM-1-0.historical.r1i1p1f1.day.rlut.gr.v20220103'
'id': 'CMIP6.CMIP.E3SM-Project.E3SM-1-0.historical.r1i1p1f1.day.rlut.gr.v20220103|esgf-data2.llnl.gov'
When I asked what date is applied in the CMIP6 “version”, I was told it was the data that the Cmorized data was generated, so I refer to the format as “vGenDate”, to distinguish it from the E3SM “vN”. I have no idea where either came from.
|
I agree we should use the vGenDate on the Cmorized data and the date should be when it was Cmorized. But what date would we use for non-Cmorized data we publish? |
That is a good question, and something “standard” should be applied.
I would opt for “publication date”. It may not be a true “version” (we often spend weeks “fixing” datasets that were incorrectly published or generated, such as having the wrong “FillValue” (NaNf versus 1e20), and so trying to create a true “version date” would be problematic).
For our E3SM (non-Cmorized) publications, I was (am) often frustrated by the fact that the ESGF publication tables do NOT tell you the date that publication occurred. You cannot query the ESGF API for date information. You cannot tell if a publication was two days old, two weeks old, or two years old. I have had to institute very thorough timestamped logging of all of our processes, with indefinite log retention, just to do effective investigation when things go awry.
In the very rare case that a set is published twice the same day, I believe we could add “_n”, as in “20220105_2” as a version, but I would need to run that by Sasha to ensure it would not break anything. (What happens if two Cmorized dataset versions are created and subsequently published on the same day? I have no idea, but Sasha should know what we must avoid.
|
The existing warehouse state-machine codes enforce "automatic publication versioning" (initial publication is "v1", subsequent publications are automatically incremented) which is inappropriate for CMIP6-style "version-dates", and likewise cannot accommodate forcing a version for a publication. CMIP6 datasets are "born with" author-supplied versions, whereas E3SM datasets are not.
Presently, most all CMIP6 publication actions must be handled "Out Of Warehouse" (state-machine).
The text was updated successfully, but these errors were encountered: