-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/region support #326
Open
blychs
wants to merge
20
commits into
NOAA-CSL:develop
Choose a base branch
from
blychs:feature/region_support
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit has advantages and disadvantages: Advantages ---------- - It does not require pooch, and only uses the standard library. - It can deal with URL's not ending with the filename. Disadvantages ------------- - It downloads the files locally instead of into the cache (may be actually good). - It does not add any checksum to the name, risking overwriting files.
It hangs if latitude and longitude are not coordinates
-The code was originally looking for data["domain_type"].cf == "domain_name" instead of data["domain_type"] == "domain_name". -auto-region:custom_box was wirtten as auto-region:custom.
Better tests are provided in the PR.
The test that is failing is the Copyright notice, which I wrote according to #321 |
zmoon
reviewed
Jan 13, 2025
zmoon
reviewed
Jan 13, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds new region support in the form of a specific utility (util/region_select.py). The docs are updated accordingly.
This includes:
Old capabilities, but using xarray instead of pandas (and the
.where
method instead of a query). The query is still and option in the utility, but due to how I change the driver, I don't see how it would really be required by the code. xarray has a "query" method, but I couldn't make it work and I believe it requires the use of a dimension to query along, which I don't think we want.New, advanced region support.
auto-region
now includesauto-region:custom
, allowing the user to provide a lonlat box. It is currently somewhat limited, though, and it cannot cross the antimeridian. The box has to be provided in the new keyworddomain_info
.New, advanced region support with
regionmask
. These require the use of the new keyworddomain_info
. The new capabilities include:custom:auto_polygon
indomain_type
. Holes in the polygon are permitted by regionmask, but I have not added that capability yet (I am not sure about the best way to do it, since it would require turning things into another dictionary, I believe).regionmask
'sdefined_regions
method withcustom:defined_regions
. These need to be defined, once again, indomain_info
.custom:custom_file
. The path or URL of the file need to be defined indomain_info
. There are a few undesirable thing about how I did the automatic download, though, and if you have better suggestions I'd be happy to change the code:- I am not using
pooch
, but downloading the files manually. The reason for this is that I was not able to find a way to tellpooch
to use the content-disposition keyword of a URL when downloading, leading to errors when the URL did not end with the name of the file. I'd rather not make the user provide that, since it's not always certain. I'm sure that there must be a way, but I could not find it.- The code does not check if the file exists, and just overwrites it. If you ask for multiple tasks using the same domain, it downloads it again every time. This is silly, and could be avoided by using
pooch
, which would also test the checksum. Once again, I could not find an easy way to avoid this when the URL does not end with the appropriate file name, which happens quite often (for example, in my tests).If you have any solutions for this, I would appreciate them. Otherwise, I'd suggest moving forward with this.
Please check also the changes in the docs. My English is far from perfect and some proofreading (and corrections!) by people with a better English than me would be great.
I am uploading here the yaml file I used to test all of this, so that it can get tested again. I tested it against surface data. Although I didn't do a complete test against TEMPO (I didn't want to add changes to the driver until I fix the merge conflicts, hopefully tomorrow) I did test the individual functions and plotted the results. Testing against AEROMMA data for aircraft would be great. Otherwise, I can try it with ASIA-AQ. Since GitHub does not allow me
I also provide here my functions for testing individual options. I did not build proper unit tests, since just asserting whether the types of data are correct or not completely NaN does not seem as useful as looking at the plots.
You will see that I did not plot the multiboxplot nor the scorecards with the regions defined with regionmask. There is probably a way to do it (by adding the mask to the dataset and not only selecting the region, possibly in a copy to avoid changing it), but it seemed a little bit confusing.
test_regionmask_tool.zip
Cheers,
Pablo
Edit: I requested some reviewers that seemed reasonable, but feel free to change that.