Skip to content

Commit

Permalink
support composing input variables
Browse files Browse the repository at this point in the history
This PR adds functionality to `DataHandler`, `TimeVaryingInputs`, and `SpaceVaryingInputs`
to enable composing multiple input variables into one data variable. To do this, the user
must specify a composing function, multiple variable names, and the file paths where they
can be read from.

Most of the changes have been made at the `DataHandler` level. Each input variable has its
own unique `FileReader` object, and each composed data variable has one `Time/SpaceVaryingInput`
and one `DataHandler`. The composing function itself is applied in the `regridded_snapshot`
function, just before regridding. The user will interact with this feature at the
`Time/SpaceVaryingInput` level.

This feature is only available when using `InterpolationsRegridder`, not `TempestRegridder`.

Design decisions made include:
- If a pre-processing function is provided, it is applied to each input variable before
  they are composed.
- Variables are composed before regridding, to preserve higher resolution information
- We assume that all input variables have the same temporal and spatial dimensions.
  This is explicitly checked in the `DataHandler` constructor, and will raise an
  informative error message if it is not true.
- Multiple input variables can come from one file, or each from their own unique file.
  We don't currently support arbitrary numbers of input variables and files, since this
  would require more work to implement and is not an expected use case in the near term.
  • Loading branch information
juliasloan25 committed Sep 9, 2024
1 parent 2f14f11 commit c9b7c0c
Show file tree
Hide file tree
Showing 8 changed files with 427 additions and 61 deletions.
27 changes: 27 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
ClimaUtilities.jl Release Notes
===============================

main
-------
- [Feature] Add support for composing input variables.
PR [#105](https://github.com/CliMA/ClimaUtilities.jl/pull/105/)

This allows a list of `varnames` (and possibly `file_paths`) to be
passed to `TimeVaryingInput` or `SpaceVaryingInput`, along with a
`compose_function` to compose them, as so:

```julia
# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data

data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
["u", "v"],
target_space,
reference_date = Dates.DateTime(2000, 1, 1),
regridder_type = :InterpolationsRegridder,
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
compose_function)
```

See the `TimeVaryingInput` or `DataHandler` docs "NetCDF file input"
sections for more details.

v0.1.14
-------

Expand Down
44 changes: 41 additions & 3 deletions docs/src/datahandling.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ interface). For instance, if need arises, the `DataHandler` can be used (almost)
directly to process files with a different format from NetCDF.

The key struct in `DataHandling` is the `DataHandler`. The `DataHandler`
contains a `FileReader`, a `Regridder`, and other metadata necessary to perform
contains one or more `FileReader`(s), a `Regridder`, and other metadata necessary to perform
its operations (e.g., target `ClimaCore.Space`). The `DataHandler` can be used
for static or temporal data, and exposes the following key functions:
- `regridded_snapshot(time)`: to obtain the regridded field at the given `time`.
Expand Down Expand Up @@ -52,7 +52,21 @@ It is possible to pass down keyword arguments to underlying constructors in
`DataHandler` with the `regridder_kwargs` and `file_reader_kwargs`. These have
to be a named tuple or a dictionary that maps `Symbol`s to values.

## Example
A `DataHandler` can contain information about a variable that we read directly from
an input file, or about a variable that is produced by composing data from multiple
input variables. In the latter case, the input variables may either all come from
the same input file, or may each come from a separate input file. The user must
provide the composing function, which operates pointwise on each of the inputs,
as well as an ordered list of the variable names to be passed to the function.
Additionally, input variables that are composed together must have the same
spatial and temporal dimensions.
Note that, if a non-identity pre-processing function is provided as part of
`file_reader_kwargs`, it will be applied to each input variable before they
are composed.
Composing multiple input variables is currently only supported with the
`InterpolationsRegridder`, not with `TempestRegridder`.

## Example: Linear interpolation of a single data variable

As an example, let us implement a simple linear interpolation for a variable `u`
defined in the `era5_example.nc` NetCDF file. The file contains monthly averages
Expand All @@ -68,7 +82,8 @@ import Interpolations

import Dates

unit_conversion_func = (data) -> 1000. * data
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data

data_handler = DataHandling.DataHandler("era5_example.nc",
"u",
Expand All @@ -93,6 +108,29 @@ function linear_interpolation(data_handler, time)
end
```

### Example appendix: Using multiple input data variables

Suppose that the input NetCDF file `era5_example.nc` contains two variables `u`
and `v`, and we care about their sum `u + v` but not their individual values.
We can provide a pointwise composing function to perform the sum, along with
the `InterpolationsRegridder` to produce the data we want, `u + v`.
The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u`
and to `v` individually, before the composing function is applied. The regridding
is applied after the composing function. `u` and `v` could also come from separate
NetCDF files, but they must still have the same spatial and temporal dimensions.

```julia
# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
data_handler = DataHandling.DataHandler("era5_example.nc",
["u", "v"],
target_space,
reference_date = Dates.DateTime(2000, 1, 1),
regridder_type = :InterpolationsRegridder,
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
compose_function)
```

## API

```@docs
Expand Down
66 changes: 61 additions & 5 deletions docs/src/inputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,66 @@ regridding onto the computational domains (using [`Regridders`](@ref) and
`TimeVaryingInputs` support:
- analytic functions of time;
- pairs of 1D arrays (for `PointSpaces`);
- 2/3D NetCDF files;
- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable);
- linear interpolation in time (default), nearest neighbors, and "period filling";
- boundary conditions and repeating periodic data.

It is possible to pass down keyword arguments to underlying constructors in the
`Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to
be a named tuple or a dictionary that maps `Symbol`s to values.

### NetCDF file inputs
2D or 3D NetCDF files can be provided as inputs using `TimeVaryingInputs`. This
could be a single variable provided in a single file, multiple variables provided
in a single file, or multiple variables each coming from a unique file.
When using multiple variables, a composing function must be provided as well,
which will be used to combine the input variables into one data variable that
is ultimately stored in the `TimeVaryingInput`. In this case, the order of
variables provided in `varnames` determines the order of the arguments
passed to the composing function.

Note that if a non-identity pre-processing function is provided as part of
`file_reader_kwargs`, it will be applied to each input variable before they
are composed.
All input variables to be composed together must have the same spatial and
temporal dimensions.

Composing multiple input variables is currently only supported with the
`InterpolationsRegridder`, not with `TempestRegridder`. The regridding
is applied after the pre-processing and composing.

Composing multiple input variables in one `Input` is also possible with
a `SpaceVaryingInput`, and everything mentioned here applies in that case.

#### Example: NetCDF file input with multiple input variables

Suppose that the input NetCDF file `era5_example.nc` contains two variables `u`
and `v`, and we care about their sum `u + v` but not their individual values.
We can provide a pointwise composing function to perform the sum, along with
the `InterpolationsRegridder` to produce the data we want, `u + v`.
The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u`
and to `v` individually, before the composing function is applied. The regridding
is applied after the composing function. `u` and `v` could also come from separate
NetCDF files, but they must still have the same spatial and temporal dimensions.

```julia
# Define the pointwise composing function we want, a simple sum in this case
compose_function = (x, y) -> x + y
# Define pre-processing function to convert units of input
unit_conversion_func = (data) -> 1000 * data

data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc",
["u", "v"],
target_space,
reference_date = Dates.DateTime(2000, 1, 1),
regridder_type = :InterpolationsRegridder,
file_reader_kwargs = (; preprocess_func = unit_conversion_func),
compose_function)
```

The same arguments (excluding `reference_date`) could be passed to a
`SpaceVaryingInput` to compose multiple input variables with that type.

### Extrapolation boundary conditions

`TimeVaryingInput`s can have multiple boundary conditions for extrapolation. By
Expand Down Expand Up @@ -131,7 +183,7 @@ by a factor of 100, we would change `albedo_tv` with
```julia
albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space;
reference_date, regridder_kwargs = (; regrid_dir = "/tmp"),
file_reader_kwargs = (; preprocess_func = (x) -> 100x)
file_reader_kwargs = (; preprocess_func = (x) -> 100x))
```

!!! note In this example we used the [`TempestRegridder`](@ref). This is not the
Expand All @@ -153,10 +205,10 @@ albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_
(chiefly the [`DataHandling`](@ref) module) to construct a `Field` from
different sources.

`TimeVaryingInputs` support:
`SpaceVaryingInputs` support:
- analytic functions of coordinates;
- pairs of 1D arrays (for columns);
- 2/3D NetCDF files.
- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable).

In some ways, a `SpaceVaryingInput` can be thought as an alternative constructor
for a `ClimaCore` `Field`.
Expand All @@ -165,6 +217,11 @@ It is possible to pass down keyword arguments to underlying constructors in the
`Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to
be a named tuple or a dictionary that maps `Symbol`s to values.

`SpaceVaryingInputs` support reading individual input variables from NetCDF files,
as well as composing multiple input variables into one `SpaceVaryingInput`.
See the [`TimeVaryingInput`](@ref) "NetCDF file inputs" section for more
information about this feature.

### Example

Let `target_space` be a `ClimaCore` `Space` where we want the `Field` to be
Expand Down Expand Up @@ -202,4 +259,3 @@ ClimaUtilities.TimeVaryingInputs.extrapolation_bc
Base.in
Base.close
```
Loading

0 comments on commit c9b7c0c

Please sign in to comment.