Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regridder Partitioning #427

Open
stephenworsley opened this issue Nov 4, 2024 · 0 comments
Open

Regridder Partitioning #427

stephenworsley opened this issue Nov 4, 2024 · 0 comments
Labels
New: Issue Highlight a new community raised "generic" issue

Comments

@stephenworsley
Copy link
Contributor

stephenworsley commented Nov 4, 2024

📰 Custom Issue

There are currently problems with handling data that is too large for memory (#310, #246). One way this could be worked around is, instead of building a single regridder to handle the source and target grid/mesh, build many smaller regridders, each responsible for some section of the source and target grid/mesh. A Partition object (name to be decided), could handle the building, saving and application of such regridders.

The Partition object would have the following functionality:

  • Can be initiatedby passing a source grid/mesh, a target grid/mesh and some collection of indices which describes the source and target subsets.
    • It may be possible in future to automatically determine an appropriate collection of subsets, however this shouldn't be necessary for a minimum viable solution.
    • It may also be necessary to pass in explicit information about the dask chunking of the input object (and desired chunking for the output object).
  • There should be some level of error checking to ensure that the partition makes sense. i.e. The entire source/target is covered, and for each pair of source/target indices, the source points cover the target points.
    • It may also be worth checking that the calculation would be managable for the given chunking strategy (and at least raise a warning in such cases).
  • There should be a method for generating regridders (a regridder from the 1st source indices to the 1st target indices, same for the 2nd etc.) and saving these regridders to a user supplied series of paths. Doing so will mean that only one regridder will need to be realised in memory at a time. The Partition object should be able to keep track of the paths to the appropriate regridders.
    • It should also be possible to give a Partition object access to the paths of previously saved regridders, via intialisation or some other method.
  • When this object has access to a full set of saved regridders, it is able to apply them in order to lazily regrid data from the source grid/mesh as if it were a regular regridder. When a chunk of data is realised, it will then load the appropriate regridders, perform regridding on that chunk and delete the regridder. In this way, only a limited number of regridders will need to be loaded into memory at any one time.
    • One problem we may have to solve is figuring out a way to realise multiple chunks in the same vertical stack which will use the same regridder in such a way as to not have to load that regridder multiple times.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New: Issue Highlight a new community raised "generic" issue
Projects
Status: No status
Development

No branches or pull requests

1 participant