diff --git a/NEWS.md b/NEWS.md index 7444e6a7..d921ae11 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,6 +1,33 @@ ClimaUtilities.jl Release Notes =============================== +main +------- +- [Feature] Add support for composing input variables. + PR [#105](https://github.com/CliMA/ClimaUtilities.jl/pull/105/) + + This allows a list of `varnames` (and possibly `file_paths`) to be + passed to `TimeVaryingInput` or `SpaceVaryingInput`, along with a + `compose_function` to compose them, as so: + + ```julia + # Define the pointwise composing function we want, a simple sum in this case + compose_function = (x, y) -> x + y + # Define pre-processing function to convert units of input + unit_conversion_func = (data) -> 1000 * data + + data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc", + ["u", "v"], + target_space, + reference_date = Dates.DateTime(2000, 1, 1), + regridder_type = :InterpolationsRegridder, + file_reader_kwargs = (; preprocess_func = unit_conversion_func), + compose_function) + ``` + + See the `TimeVaryingInput` or `DataHandler` docs "NetCDF file input" + sections for more details. + v0.1.14 ------- diff --git a/docs/src/datahandling.md b/docs/src/datahandling.md index ca6beec5..6b59936a 100644 --- a/docs/src/datahandling.md +++ b/docs/src/datahandling.md @@ -17,7 +17,7 @@ interface). For instance, if need arises, the `DataHandler` can be used (almost) directly to process files with a different format from NetCDF. The key struct in `DataHandling` is the `DataHandler`. The `DataHandler` -contains a `FileReader`, a `Regridder`, and other metadata necessary to perform +contains one or more `FileReader`(s), a `Regridder`, and other metadata necessary to perform its operations (e.g., target `ClimaCore.Space`). The `DataHandler` can be used for static or temporal data, and exposes the following key functions: - `regridded_snapshot(time)`: to obtain the regridded field at the given `time`. @@ -52,7 +52,21 @@ It is possible to pass down keyword arguments to underlying constructors in `DataHandler` with the `regridder_kwargs` and `file_reader_kwargs`. These have to be a named tuple or a dictionary that maps `Symbol`s to values. -## Example +A `DataHandler` can contain information about a variable that we read directly from +an input file, or about a variable that is produced by composing data from multiple +input variables. In the latter case, the input variables may either all come from +the same input file, or may each come from a separate input file. The user must +provide the composing function, which operates pointwise on each of the inputs, +as well as an ordered list of the variable names to be passed to the function. +Additionally, input variables that are composed together must have the same +spatial and temporal dimensions. +Note that, if a non-identity pre-processing function is provided as part of +`file_reader_kwargs`, it will be applied to each input variable before they +are composed. +Composing multiple input variables is currently only supported with the +`InterpolationsRegridder`, not with `TempestRegridder`. + +## Example: Linear interpolation of a single data variable As an example, let us implement a simple linear interpolation for a variable `u` defined in the `era5_example.nc` NetCDF file. The file contains monthly averages @@ -68,7 +82,8 @@ import Interpolations import Dates -unit_conversion_func = (data) -> 1000. * data +# Define pre-processing function to convert units of input +unit_conversion_func = (data) -> 1000 * data data_handler = DataHandling.DataHandler("era5_example.nc", "u", @@ -93,6 +108,29 @@ function linear_interpolation(data_handler, time) end ``` +### Example appendix: Using multiple input data variables + +Suppose that the input NetCDF file `era5_example.nc` contains two variables `u` +and `v`, and we care about their sum `u + v` but not their individual values. +We can provide a pointwise composing function to perform the sum, along with +the `InterpolationsRegridder` to produce the data we want, `u + v`. +The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u` +and to `v` individually, before the composing function is applied. The regridding +is applied after the composing function. `u` and `v` could also come from separate +NetCDF files, but they must still have the same spatial and temporal dimensions. + +```julia +# Define the pointwise composing function we want, a simple sum in this case +compose_function = (x, y) -> x + y +data_handler = DataHandling.DataHandler("era5_example.nc", + ["u", "v"], + target_space, + reference_date = Dates.DateTime(2000, 1, 1), + regridder_type = :InterpolationsRegridder, + file_reader_kwargs = (; preprocess_func = unit_conversion_func), + compose_function) +``` + ## API ```@docs diff --git a/docs/src/inputs.md b/docs/src/inputs.md index 51f736d6..94428982 100644 --- a/docs/src/inputs.md +++ b/docs/src/inputs.md @@ -28,7 +28,7 @@ regridding onto the computational domains (using [`Regridders`](@ref) and `TimeVaryingInputs` support: - analytic functions of time; - pairs of 1D arrays (for `PointSpaces`); -- 2/3D NetCDF files; +- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable); - linear interpolation in time (default), nearest neighbors, and "period filling"; - boundary conditions and repeating periodic data. @@ -36,6 +36,58 @@ It is possible to pass down keyword arguments to underlying constructors in the `Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to be a named tuple or a dictionary that maps `Symbol`s to values. +### NetCDF file inputs +2D or 3D NetCDF files can be provided as inputs using `TimeVaryingInputs`. This +could be a single variable provided in a single file, multiple variables provided +in a single file, or multiple variables each coming from a unique file. +When using multiple variables, a composing function must be provided as well, +which will be used to combine the input variables into one data variable that +is ultimately stored in the `TimeVaryingInput`. In this case, the order of +variables provided in `varnames` determines the order of the arguments +passed to the composing function. + +Note that if a non-identity pre-processing function is provided as part of +`file_reader_kwargs`, it will be applied to each input variable before they +are composed. +All input variables to be composed together must have the same spatial and +temporal dimensions. + +Composing multiple input variables is currently only supported with the +`InterpolationsRegridder`, not with `TempestRegridder`. The regridding +is applied after the pre-processing and composing. + +Composing multiple input variables in one `Input` is also possible with +a `SpaceVaryingInput`, and everything mentioned here applies in that case. + +#### Example: NetCDF file input with multiple input variables + +Suppose that the input NetCDF file `era5_example.nc` contains two variables `u` +and `v`, and we care about their sum `u + v` but not their individual values. +We can provide a pointwise composing function to perform the sum, along with +the `InterpolationsRegridder` to produce the data we want, `u + v`. +The `preprocess_func` passed in `file_reader_kwargs` will be applied to `u` +and to `v` individually, before the composing function is applied. The regridding +is applied after the composing function. `u` and `v` could also come from separate +NetCDF files, but they must still have the same spatial and temporal dimensions. + +```julia +# Define the pointwise composing function we want, a simple sum in this case +compose_function = (x, y) -> x + y +# Define pre-processing function to convert units of input +unit_conversion_func = (data) -> 1000 * data + +data_handler = TimeVaryingInputs.TimeVaryingInput("era5_example.nc", + ["u", "v"], + target_space, + reference_date = Dates.DateTime(2000, 1, 1), + regridder_type = :InterpolationsRegridder, + file_reader_kwargs = (; preprocess_func = unit_conversion_func), + compose_function) +``` + +The same arguments (excluding `reference_date`) could be passed to a +`SpaceVaryingInput` to compose multiple input variables with that type. + ### Extrapolation boundary conditions `TimeVaryingInput`s can have multiple boundary conditions for extrapolation. By @@ -131,7 +183,7 @@ by a factor of 100, we would change `albedo_tv` with ```julia albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_space; reference_date, regridder_kwargs = (; regrid_dir = "/tmp"), - file_reader_kwargs = (; preprocess_func = (x) -> 100x) + file_reader_kwargs = (; preprocess_func = (x) -> 100x)) ``` !!! note In this example we used the [`TempestRegridder`](@ref). This is not the @@ -153,10 +205,10 @@ albedo_tv = TimeVaryingInputs.TimeVaryingInput("cesem_albedo.nc", "alb", target_ (chiefly the [`DataHandling`](@ref) module) to construct a `Field` from different sources. -`TimeVaryingInputs` support: +`SpaceVaryingInputs` support: - analytic functions of coordinates; - pairs of 1D arrays (for columns); -- 2/3D NetCDF files. +- 2/3D NetCDF files (including composing multiple variables from one or more files into one variable). In some ways, a `SpaceVaryingInput` can be thought as an alternative constructor for a `ClimaCore` `Field`. @@ -165,6 +217,11 @@ It is possible to pass down keyword arguments to underlying constructors in the `Regridder` with the `regridder_kwargs` and `file_reader_kwargs`. These have to be a named tuple or a dictionary that maps `Symbol`s to values. +`SpaceVaryingInputs` support reading individual input variables from NetCDF files, +as well as composing multiple input variables into one `SpaceVaryingInput`. +See the [`TimeVaryingInput`](@ref) "NetCDF file inputs" section for more +information about this feature. + ### Example Let `target_space` be a `ClimaCore` `Space` where we want the `Field` to be @@ -202,4 +259,3 @@ ClimaUtilities.TimeVaryingInputs.extrapolation_bc Base.in Base.close ``` - diff --git a/ext/DataHandlingExt.jl b/ext/DataHandlingExt.jl index 70552098..bce3af12 100644 --- a/ext/DataHandlingExt.jl +++ b/ext/DataHandlingExt.jl @@ -17,7 +17,7 @@ import ClimaUtilities.DataHandling """ DataHandler{ - FR <: AbstractFileReader, + FR <: AbstractDict{<:AbstractString, <:AbstractFileReader}, REG <: AbstractRegridder, SPACE <: ClimaCore.Spaces.AbstractSpace, TSTART <: AbstractFloat, @@ -25,22 +25,31 @@ import ClimaUtilities.DataHandling DIMS, TIMES <: AbstractArray{<:AbstractFloat}, CACHE <: DataStructures.LRUCache{Dates.DateTime, ClimaCore.Fields.Field}, + FUNC <: Function, } -Currently, the `DataHandler` works with one variable at the time. This might not be the most +Currently, the `DataHandler` remaps one variable at the time. This might not be the most efficiently way to tackle the problem: we should be able to reuse the interpolation weights if multiple variables are defined on the same grid and have to be remapped to the same grid. +Multiple input variables may be stored in the `DataHandler` by providing a dictionary +`file_readers` of variable names mapped to `FileReader` objects, along with a `compose_function` +that combines them into a single data variable. This is useful when we require variables +that are not directly available in the input data, but can be computed from the available variables. +The order of the variables in `varnames` must match the argument order of `compose_function`. + Assumptions: -- There is only one file with the entire time development of the given variable +- For each input variable, there is only one file with the entire time development of the given variable - The file has well-defined physical dimensions (e.g., lat/lon) - Currently, the time dimension has to be either "time" or "date", the spatial dimensions have to be lat and lon (restriction from TempestRegridder) +- If using multiple input variables, the input variables must have the same time development + and spatial resolution DataHandler is meant to live on the CPU, but the Fields can be on the GPU as well. """ struct DataHandler{ - FR <: AbstractFileReader, + FR <: AbstractDict{<:AbstractString, <:AbstractFileReader}, REG <: AbstractRegridder, SPACE <: ClimaCore.Spaces.AbstractSpace, TSTART <: AbstractFloat, @@ -48,9 +57,11 @@ struct DataHandler{ DIMS, TIMES <: AbstractArray{<:AbstractFloat}, CACHE <: DataStructures.LRUCache{Dates.DateTime, ClimaCore.Fields.Field}, + FUNC <: Function, + NAMES <: AbstractArray{<:AbstractString}, } - """Object responsible for getting the data from disk to memory""" - file_reader::FR + """Dictionary of variable names and objects responsible for getting the input data from disk to memory""" + file_readers::FR """Object responsible for resampling a rectangular grid to the simulation grid""" regridder::REG @@ -76,11 +87,17 @@ struct DataHandler{ """Private field where cached data is stored""" _cached_regridded_fields::CACHE + + """Function to combine multiple input variables into a single data variable""" + compose_function::FUNC + + """Names of the datasets in the NetCDF that have to be read and processed""" + varnames::NAMES end """ - DataHandler(file_path::AbstractString, - varname::AbstractString, + DataHandler(file_paths::Union{AbstractString, AbstractArray{<:AbstractString}}, + varnames::Union{AbstractString, AbstractArray{<:AbstractString}}, target_space::ClimaCore.Spaces.AbstractSpace; reference_date::Dates.DateTime = Dates.DateTime(1979, 1, 1), t_start::AbstractFloat = 0.0, @@ -89,15 +106,17 @@ end regridder_kwargs = (), file_reader_kwargs = ()) -Create a `DataHandler` to read `varname` from `file_path` and remap it to `target_space`. +Create a `DataHandler` to read `varnames` from `file_paths` and remap them to `target_space`. +`file_paths` may contain either one path for all variables or one path for each variable. +In the latter case, the entries of `file_paths` and `varnames` are expected to match based on position. The DataHandler maintains an LRU cache of Fields that were previously computed. Positional arguments ===================== -- `file_path`: Path of the NetCDF file that contains the data. -- `varname`: Name of the dataset in the NetCDF that has to be read and processed. +- `file_paths`: Paths of the NetCDF file(s) that contain the input data. +- `varnames`: Names of the datasets in the NetCDF that have to be read and processed. - `target_space`: Space where the simulation is run, where the data has to be regridded to. Keyword arguments @@ -122,10 +141,13 @@ everything more type stable.) It can be a NamedTuple, or a Dictionary that maps Symbols to values. - `file_reader_kwargs`: Additional keywords to be passed to the constructor of the file reader. It can be a NamedTuple, or a Dictionary that maps Symbols to values. +- `compose_function`: Function to combine multiple input variables into a single data variable. + The default, to be used in the case of one input variable, is the identity. + Note that the order of `varnames` must match the argument order of `compose_function`. """ function DataHandling.DataHandler( - file_path::AbstractString, - varname::AbstractString, + file_paths::Union{AbstractString, AbstractArray{<:AbstractString}}, + varnames::Union{AbstractString, AbstractArray{<:AbstractString}}, target_space::ClimaCore.Spaces.AbstractSpace; reference_date::Union{Dates.DateTime, Dates.Date} = Dates.DateTime( 1979, @@ -137,15 +159,64 @@ function DataHandling.DataHandler( cache_max_size::Int = 128, regridder_kwargs = (), file_reader_kwargs = (), + compose_function = identity, ) + # Convert `file_paths` and `varnames` to arrays if they are not already + # After this point, we assume that `file_paths` and `varnames` are arrays, possibly with only one element + if file_paths isa AbstractString + file_paths = [file_paths] + end + if varnames isa AbstractString + varnames = [varnames] + end + + # Verify that the number of file paths and variable names match + (length(file_paths) > 1 && length(file_paths) != length(varnames)) && error( + "Number of file paths ($(length(file_paths))) and variable names ($(length(varnames))) do not match.", + ) + + # Verify that `compose_function` is specified when using multiple input variables + (length(varnames) > 1 && compose_function == identity) && error( + "`compose_function` must be specified when using multiple input variables", + ) + + # Verify that `compose_function` is identity when using a single input variable + (length(varnames) == 1 && compose_function != identity) && error( + "`compose_function` must be identity when using a single input variable", + ) + + # TempestRegridder does not support multiple input variables + (length(varnames) > 1 && regridder_type == :TempestRegridder) && + error("TempestRegridder does not support multiple input variables") # Determine which regridder to use if not already specified regridder_type = isnothing(regridder_type) ? Regridders.default_regridder_type() : regridder_type - # File reader, deals with ingesting data, possibly buffered/cached - file_reader = NCFileReader(file_path, varname; file_reader_kwargs...) + # Construct a file reader, which deals with ingesting data and is possibly buffered/cached, for each input file + file_readers = Dict{String, AbstractFileReader}() + all_vars_in_same_file = length(file_paths) == 1 + for (i, varname) in enumerate(varnames) + file_path = all_vars_in_same_file ? first(file_paths) : file_paths[i] + file_readers[varname] = + NCFileReader(file_path, varname; file_reader_kwargs...) + end + + # Verify that the spatial dimensions are the same for each variable + @assert length( + Set(file_reader.dimensions for file_reader in values(file_readers)), + ) == 1 + + # Verify that the time information is the same for all variables + @assert length( + Set( + file_reader.available_dates for file_reader in values(file_readers) + ), + ) == 1 + @assert length( + Set(file_reader.time_index for file_reader in values(file_readers)), + ) == 1 regridder_args = () @@ -165,7 +236,9 @@ function DataHandling.DataHandler( regridder_kwargs = merge((; regrid_dir), regridder_kwargs) end - regridder_args = (target_space, varname, file_path) + # Note: using one arbitrary element of `varnames` and of `file_paths` assumes + # that all input variables will use the same regridding + regridder_args = (target_space, first(varnames), first(file_paths)) elseif regridder_type == :InterpolationsRegridder regridder_args = (target_space,) end @@ -179,30 +252,35 @@ function DataHandling.DataHandler( max_size = cache_max_size, ) - available_dates = file_reader.available_dates + # Note: using one arbitrary element of `file_readers` assumes + # that all input variables have the same time development + available_dates = first(values(file_readers)).available_dates times_s = period_to_seconds_float.(available_dates .- reference_date) available_times = times_s .- t_start + dimensions = first(values(file_readers)).dimensions return DataHandler( - file_reader, + file_readers, regridder, target_space, - file_reader.dimensions, + dimensions, available_dates, t_start, Dates.DateTime(reference_date), available_times, _cached_regridded_fields, + compose_function, + varnames, ) end """ close(data_handler::DataHandler) -Close any file associated to the given `data_handler`. +Close all files associated to the given `data_handler`. """ function Base.close(data_handler::DataHandler) - close(data_handler.file_reader) + foreach(close, values(data_handler.file_readers)) return nothing end @@ -345,6 +423,9 @@ Return the regridded snapshot from `data_handler` associated to the given `time` The `time` has to be available in the `data_handler`. +When using multiple input variables, the `varnames` argument determines the order of arguments +to the `compose_function` function used to produce the data variable. + `regridded_snapshot` potentially modifies the internal state of `data_handler` and it might be a very expensive operation. """ @@ -352,24 +433,41 @@ function DataHandling.regridded_snapshot( data_handler::DataHandler, date::Dates.DateTime, ) + varnames = data_handler.varnames + compose_function = data_handler.compose_function + # Dates.DateTime(0) is the cache key for static maps if date != Dates.DateTime(0) - date in data_handler.available_dates || error( - "Date $date not available in file $(data_handler.file_reader.file_path)", - ) + file_path = data_handler.file_readers[first(varnames)].file_path + date in data_handler.available_dates || + error("Date $date not available in file $(file_path)") end regridder_type = nameof(typeof(data_handler.regridder)) regrid_args = () + # Check if the regridded field at this date is already in the cache return get!(data_handler._cached_regridded_fields, date) do if regridder_type == :TempestRegridder - regrid_args = (date,) + if length(data_handler.file_readers) > 1 + error( + "TempestRegridder does not support multiple input variables. Please use InterpolationsRegridder.", + ) + else + regrid_args = (date,) + end elseif regridder_type == :InterpolationsRegridder - regrid_args = - (read(data_handler.file_reader, date), data_handler.dimensions) + # Read input data from each file, maintaining order, and apply composing function + # In the case of a single input variable, it will remain unchanged + data_composed = compose_function( + ( + read(data_handler.file_readers[varname], date) for + varname in varnames + )..., + ) + regrid_args = (data_composed, data_handler.dimensions) else - error("Uncaught case") + error("Invalid regridder type") end regrid(data_handler.regridder, regrid_args...) end @@ -393,9 +491,7 @@ function DataHandling.regridded_snapshot(data_handler::DataHandler) end """ - regridded_snapshot(dest::ClimaCore.Fields.Field, data_handler::DataHandler, date::Dates.DateTime) - regridded_snapshot(data_handler::DataHandler, time::AbstractFloat) - regridded_snapshot(data_handler::DataHandler) + regridded_snapshot!(dest::ClimaCore.Fields.Field, data_handler::DataHandler, date::Dates.DateTime) Write to `dest` the regridded snapshot from `data_handler` associated to the given `time`. diff --git a/ext/SpaceVaryingInputsExt.jl b/ext/SpaceVaryingInputsExt.jl index 3b7f80ea..a60fa74c 100644 --- a/ext/SpaceVaryingInputsExt.jl +++ b/ext/SpaceVaryingInputsExt.jl @@ -126,8 +126,8 @@ end """ SpaceVaryingInput(data_handler::DataHandler) - SpaceVaryingInput(file_path::AbstractString, - varname::AbstractString, + SpaceVaryingInput(file_paths::Union{AbstractString, AbstractArray{String}}, + varnames::Union{AbstractString, AbstractArray{String}}, target_space::Spaces.AbstractSpace; regridder_type::Symbol, regridder_kwargs = (), @@ -144,21 +144,23 @@ function SpaceVaryingInputs.SpaceVaryingInput(data_handler) end function SpaceVaryingInputs.SpaceVaryingInput( - file_path, - varname, + file_paths, + varnames, target_space; regridder_type = nothing, regridder_kwargs = (), file_reader_kwargs = (), + compose_function = identity, ) return SpaceVaryingInputs.SpaceVaryingInput( DataHandler( - file_path, - varname, + file_paths, + varnames, target_space; regridder_type, regridder_kwargs, file_reader_kwargs, + compose_function, ), ) end diff --git a/ext/TimeVaryingInputsExt.jl b/ext/TimeVaryingInputsExt.jl index c9144e02..f7215139 100644 --- a/ext/TimeVaryingInputsExt.jl +++ b/ext/TimeVaryingInputsExt.jl @@ -140,8 +140,8 @@ function TimeVaryingInputs.TimeVaryingInput( end function TimeVaryingInputs.TimeVaryingInput( - file_path::AbstractString, - varname::AbstractString, + file_paths::Union{AbstractString, AbstractArray{<:AbstractString}}, + varnames::Union{AbstractString, AbstractArray{<:AbstractString}}, target_space::ClimaCore.Spaces.AbstractSpace; method = LinearInterpolation(), reference_date::Dates.DateTime = Dates.DateTime(1979, 1, 1), @@ -149,16 +149,18 @@ function TimeVaryingInputs.TimeVaryingInput( regridder_type = nothing, regridder_kwargs = (), file_reader_kwargs = (), + compose_function = identity, ) data_handler = DataHandling.DataHandler( - file_path, - varname, + file_paths, + varnames, target_space; reference_date, t_start, regridder_type, regridder_kwargs, file_reader_kwargs, + compose_function, ) if extrapolation_bc(method) isa PeriodicCalendar{Nothing} && !isequispaced(DataHandling.available_times(data_handler)) diff --git a/test/data_handling.jl b/test/data_handling.jl index 212100c7..4fd2587d 100644 --- a/test/data_handling.jl +++ b/test/data_handling.jl @@ -7,7 +7,6 @@ import ClimaUtilities.DataHandling import ClimaCore import ClimaComms -import ClimaComms @static pkgversion(ClimaComms) >= v"0.6" && ClimaComms.@import_required_backends import Interpolations as Intp import ClimaCoreTempestRemap @@ -21,12 +20,19 @@ ClimaComms.init(context) artifact"era5_static_example", "era5_t2m_sp_u10n_20210101_static.nc", ) - varname = "sp" for regridder_type in (:InterpolationsRegridder, :TempestRegridder) if regridder_type == :TempestRegridder && Sys.iswindows() continue end - for FT in (Float32, Float64), use_spacefillingcurve in (true, false) + for FT in (Float32, Float64), + use_spacefillingcurve in (true, false), + varnames in (["sp"], ["sp", "t2m"]) + + # TempestRegridder does not support multiple input variables + if regridder_type == :TempestRegridder && length(varnames) > 1 + continue + end + radius = FT(6731e3) helem = 40 Nq = 4 @@ -48,11 +54,14 @@ ClimaComms.init(context) target_space = ClimaCore.Spaces.SpectralElementSpace2D(horztopology, quad) + compose_function = + length(varnames) == 1 ? identity : (x, y) -> x + y data_handler = DataHandling.DataHandler( PATH, - varname, + varnames, target_space; regridder_type, + compose_function, ) @test DataHandling.available_dates(data_handler) == DateTime[] @@ -62,15 +71,64 @@ ClimaComms.init(context) @test field isa ClimaCore.Fields.Field @test axes(field) == target_space @test eltype(field) == FT - @test minimum(field) >= - minimum(data_handler.file_reader.dataset[varname][:, :]) - @test maximum(field) <= - maximum(data_handler.file_reader.dataset[varname][:, :]) + + # For one variable, check that the regridded data is the same as the original data + if length(varnames) == 1 + @test minimum(field) >= minimum( + data_handler.file_readers[first(varnames)].dataset[first( + varnames, + )][ + :, + :, + ], + ) + @test maximum(field) <= maximum( + data_handler.file_readers[first(varnames)].dataset[first( + varnames, + )][ + :, + :, + ], + ) + else + # For more than one variable, check that both point to the same file path + @test data_handler.file_readers[varnames[1]].file_path == + data_handler.file_readers[varnames[2]].file_path + # For more than one variable, check that the compose function was applied correctly + @test data_handler.compose_function == compose_function + + @test minimum(field) >= minimum( + compose_function.( + data_handler.file_readers[varnames[1]].dataset[varnames[1]][ + :, + :, + ], + data_handler.file_readers[varnames[2]].dataset[varnames[2]][ + :, + :, + ], + ), + ) + @test maximum(field) <= maximum( + compose_function.( + data_handler.file_readers[varnames[1]].dataset[varnames[1]][ + :, + :, + ], + data_handler.file_readers[varnames[2]].dataset[varnames[2]][ + :, + :, + ], + ), + ) + end close(data_handler) end end + # Test passing arguments down + varnames = ["sp"] radius = 6731e3 helem = 40 Nq = 4 @@ -83,7 +141,7 @@ ClimaComms.init(context) data_handler = DataHandling.DataHandler( PATH, - varname, + varnames, target_space; regridder_type = :InterpolationsRegridder, file_reader_kwargs = (; preprocess_func = (data) -> 0.0 * data), @@ -98,6 +156,71 @@ ClimaComms.init(context) @test extrema(field) == (0.0, 0.0) end +@testset "DataHandler errors" begin + # Create dummy file paths and variable names + path1 = "abc" + path2 = "def" + + var1 = "var1" + var2 = "var2" + var3 = "var3" + + # Construct space + FT = Float32 + radius = FT(6731e3) + helem = 40 + Nq = 4 + + horzdomain = ClimaCore.Domains.SphereDomain(radius) + horzmesh = ClimaCore.Meshes.EquiangularCubedSphere(horzdomain, helem) + horztopology = ClimaCore.Topologies.Topology2D( + context, + horzmesh, + ClimaCore.Topologies.spacefillingcurve(horzmesh), + ) + quad = ClimaCore.Spaces.Quadratures.GLL{Nq}() + target_space = ClimaCore.Spaces.SpectralElementSpace2D(horztopology, quad) + + regridder_type = :InterpolationsRegridder + compose_function = (x, y) -> x + y + + # Test that multiple paths and multiple variables with different quantities are not allowed + @test_throws ErrorException data_handler = DataHandling.DataHandler( + [path1, path2], + [var1, var2, var3], + target_space; + regridder_type, + compose_function, + ) + + # Providing multiple variables without a compose function is not allowed + @test_throws ErrorException DataHandling.DataHandler( + path1, + [var1, var2], + target_space; + regridder_type, + ) + + # Providing a compose function with a single variable is not allowed + @test_throws ErrorException DataHandling.DataHandler( + path1, + var1, + target_space; + regridder_type, + compose_function, + ) + + # TempestRegridder does not support multiple input variables + regridder_type_tr = :TempestRegridder + @test_throws ErrorException DataHandling.DataHandler( + path1, + [var1, var2], + target_space; + regridder_type = regridder_type_tr, + compose_function, + ) +end + @testset "DataHandler, TempestRegridder, time data" begin if !Sys.iswindows() PATH = joinpath(artifact"era5_example", "era5_t2m_sp_u10n_20210101.nc") @@ -127,7 +250,7 @@ end ) @test DataHandling.available_dates(data_handler) == - data_handler.file_reader.available_dates + data_handler.file_readers[varname].available_dates @test data_handler.reference_date .+ Second.(DataHandling.available_times(data_handler)) == DataHandling.available_dates(data_handler) @@ -252,10 +375,18 @@ end @test axes(field) == target_space @test eltype(field) == FT @test minimum(field) >= minimum( - data_handler.file_reader.dataset[varname][:, :, 10], + data_handler.file_readers[varname].dataset[varname][ + :, + :, + 10, + ], ) @test maximum(field) <= maximum( - data_handler.file_reader.dataset[varname][:, :, 10], + data_handler.file_readers[varname].dataset[varname][ + :, + :, + 10, + ], ) close(data_handler) diff --git a/test/time_varying_inputs23.jl b/test/time_varying_inputs23.jl index 791db764..f72076dc 100644 --- a/test/time_varying_inputs23.jl +++ b/test/time_varying_inputs23.jl @@ -51,6 +51,19 @@ include("TestTools.jl") regridder_type, ) + # Test constructor with multiple variables + regridder_type_interp = :InterpolationsRegridder + compose_function = (x, y) -> x + y + input_multiple_vars = TimeVaryingInputs.TimeVaryingInput( + PATH, + ["u10n", "t2m"], + target_space; + reference_date = Dates.DateTime(2021, 1, 1, 1), + t_start = 0.0, + regridder_type = regridder_type_interp, + compose_function, + ) + input_nearest = TimeVaryingInputs.TimeVaryingInput( PATH, "u10n", @@ -373,6 +386,7 @@ include("TestTools.jl") @test parent(dest) ≈ parent(expected) + close(input_multiple_vars) close(input_nearest) close(input_linear) close(input_nearest_flat)