Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/region support #326

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

blychs
Copy link
Collaborator

@blychs blychs commented Dec 31, 2024

Adds new region support in the form of a specific utility (util/region_select.py). The docs are updated accordingly.
This includes:
Old capabilities, but using xarray instead of pandas (and the .where method instead of a query). The query is still and option in the utility, but due to how I change the driver, I don't see how it would really be required by the code. xarray has a "query" method, but I couldn't make it work and I believe it requires the use of a dimension to query along, which I don't think we want.

New, advanced region support.
auto-region now includes auto-region:custom, allowing the user to provide a lonlat box. It is currently somewhat limited, though, and it cannot cross the antimeridian. The box has to be provided in the new keyword domain_info.

New, advanced region support with regionmask. These require the use of the new keyword domain_info. The new capabilities include:

  • Defining one or more polygons with custom:auto_polygon in domain_type. Holes in the polygon are permitted by regionmask, but I have not added that capability yet (I am not sure about the best way to do it, since it would require turning things into another dictionary, I believe).
  • Using regionmask's defined_regions method with custom:defined_regions. These need to be defined, once again, in domain_info.
  • Using a shapefile/geojson with custom:custom_file. The path or URL of the file need to be defined in domain_info. There are a few undesirable thing about how I did the automatic download, though, and if you have better suggestions I'd be happy to change the code:
      - I am not using pooch, but downloading the files manually. The reason for this is that I was not able to find a way to tell pooch to use the content-disposition keyword of a URL when downloading, leading to errors when the URL did not end with the name of the file. I'd rather not make the user provide that, since it's not always certain. I'm sure that there must be a way, but I could not find it.
      - The code does not check if the file exists, and just overwrites it. If you ask for multiple tasks using the same domain, it downloads it again every time. This is silly, and could be avoided by using pooch, which would also test the checksum. Once again, I could not find an easy way to avoid this when the URL does not end with the appropriate file name, which happens quite often (for example, in my tests).

If you have any solutions for this, I would appreciate them. Otherwise, I'd suggest moving forward with this.
Please check also the changes in the docs. My English is far from perfect and some proofreading (and corrections!) by people with a better English than me would be great.

I am uploading here the yaml file I used to test all of this, so that it can get tested again. I tested it against surface data. Although I didn't do a complete test against TEMPO (I didn't want to add changes to the driver until I fix the merge conflicts, hopefully tomorrow) I did test the individual functions and plotted the results. Testing against AEROMMA data for aircraft would be great. Otherwise, I can try it with ASIA-AQ. Since GitHub does not allow me
I also provide here my functions for testing individual options. I did not build proper unit tests, since just asserting whether the types of data are correct or not completely NaN does not seem as useful as looking at the plots.

You will see that I did not plot the multiboxplot nor the scorecards with the regions defined with regionmask. There is probably a way to do it (by adding the mask to the dataset and not only selecting the region, possibly in a copy to avoid changing it), but it seemed a little bit confusing.
test_regionmask_tool.zip

Cheers,
Pablo

Edit: I requested some reviewers that seemed reasonable, but feel free to change that.

        This commit has advantages and disadvantages:

        Advantages
        ----------
        - It does not require pooch, and only uses the standard library.
        - It can deal with URL's not ending with the filename.

        Disadvantages
        -------------
        - It downloads the files locally instead of into the cache (may be
          actually good).
        - It does not add any checksum to the name, risking overwriting files.
	It hangs if latitude and longitude are not coordinates
	-The code was originally looking for
	 data["domain_type"].cf == "domain_name"
	 instead of data["domain_type"] == "domain_name".
	-auto-region:custom_box was wirtten as auto-region:custom.
	Better tests are provided in the PR.
@blychs
Copy link
Collaborator Author

blychs commented Dec 31, 2024

The test that is failing is the Copyright notice, which I wrote according to #321

docs/appendix/yaml.rst Outdated Show resolved Hide resolved
docs/appendix/yaml.rst Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants