Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dataless cubes #6253

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
96a3f2d
Init commit, part way through ensuring Nones aren't wrapped in np.array
ESadek-MO Dec 11, 2024
a2e0da9
Merge branch 'main' into dataless
ESadek-MO Dec 11, 2024
7114f0f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
84ea908
None types are no longer wrapped
ESadek-MO Dec 11, 2024
44afb25
merge conflicts
ESadek-MO Dec 11, 2024
cdc51d5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
7b402e0
clarified axiom check
ESadek-MO Dec 11, 2024
bc1ee6f
moved shape order in Cube
ESadek-MO Dec 11, 2024
17e0b30
merge conflicts
ESadek-MO Dec 11, 2024
16779aa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
f42c19d
replace call to self.data with self.core_data()
ESadek-MO Dec 11, 2024
b4ffc31
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 11, 2024
6d62c7d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
cc13e6d
fixed test regex
ESadek-MO Dec 11, 2024
71c7ae8
precommit
ESadek-MO Dec 11, 2024
e59d5c9
written tests, and refactored redundant checks
ESadek-MO Dec 12, 2024
95c09e1
refactored tests
ESadek-MO Dec 12, 2024
b2fb2f8
removed shape asserts within data tests
ESadek-MO Dec 12, 2024
c6510a5
copy cube now has FUTURE behaviour
ESadek-MO Dec 12, 2024
af7d727
pre-commit
ESadek-MO Dec 12, 2024
144e164
written tests for cube.copy
ESadek-MO Dec 16, 2024
ba84553
fixed cbe.copy failure in dim coords
ESadek-MO Dec 16, 2024
ea3c150
experimenting with 4d cube with everything; doesn't run locally
ESadek-MO Dec 16, 2024
0cec4ca
pre-c
ESadek-MO Dec 16, 2024
6ed270d
edited Coord.copy
ESadek-MO Dec 16, 2024
d79ad27
Revert "edited Coord.copy"
ESadek-MO Dec 16, 2024
2bb6e61
tried tearingdown FUTUREFLAG
ESadek-MO Dec 16, 2024
063e45b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 16, 2024
79d7d67
made dataless copy opt-in behaviour
ESadek-MO Dec 17, 2024
9b52067
cube data is optional
ESadek-MO Dec 17, 2024
2f2b56a
added is_dataless, and corrected data manager exceptions to not be cu…
ESadek-MO Dec 17, 2024
c8d2c07
pre-c
ESadek-MO Dec 17, 2024
67fb787
fixed failing tests
ESadek-MO Dec 17, 2024
379b273
Merge branch 'main' into dataless
ESadek-MO Dec 17, 2024
a042451
fixed copying from dataless
ESadek-MO Dec 17, 2024
088bb94
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 17, 2024
21489d5
added in exceptions, not tested
ESadek-MO Dec 18, 2024
a059c5e
renamed DATALESS_COPY to DATALESS, and added comment
ESadek-MO Dec 18, 2024
19e956d
fixed merge conflicts
ESadek-MO Dec 18, 2024
8178809
made DATALESS docstring a docstring, rather than a comment
ESadek-MO Dec 18, 2024
2123b8f
added whatsnew
ESadek-MO Dec 18, 2024
703bbb6
added cubelist errors
ESadek-MO Dec 18, 2024
33649c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 18, 2024
021bd31
review comments
ESadek-MO Dec 18, 2024
f1bfe7b
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 18, 2024
abb5a57
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 18, 2024
94d02ea
fixed broken review suggestion
ESadek-MO Dec 18, 2024
5be6a1f
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 18, 2024
5c5bbf8
fixed some problems, and ensured self._shape is always set
ESadek-MO Dec 20, 2024
de348f9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 20, 2024
7c74460
fixed __eq__ and __repr__, and maybe dtype of dataless equal None
ESadek-MO Dec 20, 2024
0ceed69
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 20, 2024
8bddec7
included self.core_data().shape when shape is none, for Coord cases
ESadek-MO Dec 23, 2024
2a67050
made sure _shape is set in case of prexisting _shape of ()
ESadek-MO Dec 24, 2024
f8c2dad
added eq and repr tests, and fixed shape of () not being valid
ESadek-MO Dec 24, 2024
ed75485
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 24, 2024
9221917
fixed test now that self._shape is always set
ESadek-MO Dec 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions lib/iris/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,12 @@ def callback(cube, field, filename):
class Future(threading.local):
"""Run-time configuration controller."""

def __init__(self, datum_support=False, pandas_ndim=False, save_split_attrs=False):
def __init__(
self,
datum_support=False,
pandas_ndim=False,
save_split_attrs=False,
):
"""Container for run-time options controls.

To adjust the values simply update the relevant attribute from
Expand Down Expand Up @@ -189,7 +194,11 @@ def __repr__(self):
# msg = ('Future(example_future_flag={})')
# return msg.format(self.example_future_flag)
msg = "Future(datum_support={}, pandas_ndim={}, save_split_attrs={})"
return msg.format(self.datum_support, self.pandas_ndim, self.save_split_attrs)
return msg.format(
self.datum_support,
self.pandas_ndim,
self.save_split_attrs,
)

# deprecated_options = {'example_future_flag': 'warning',}
deprecated_options: dict[str, Literal["error", "warning"]] = {}
Expand Down Expand Up @@ -832,3 +841,7 @@ def use_plugin(plugin_name):
significance of the import statement and warn that it is an unused import.
"""
importlib.import_module(f"iris.plugins.{plugin_name}")


# To be used when copying a cube to make the new cube dataless.
DATALESS = "NONE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Move this definition to the top of the module please and include in the __all__.

77 changes: 55 additions & 22 deletions lib/iris/_data_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,42 @@
import numpy.ma as ma

from iris._lazy_data import as_concrete_data, as_lazy_data, is_lazy_data
import iris.exceptions
import iris.warnings


class DataManager:
"""Provides a well defined API for management of real or lazy data."""

def __init__(self, data):
def __init__(self, data, shape=None):
"""Create a data manager for the specified data.

Parameters
----------
data :
The :class:`~numpy.ndarray` or :class:`~numpy.ma.core.MaskedArray`
real data, or :class:`~dask.array.core.Array` lazy data to be
managed.
managed. If a value of None is given, the data manager will be
considered dataless.

shape :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO We've been adopting the following numpydoc standard for specifying the type of parameters i.e., shape : tuple, optional

Same standard applies to the data parameter 👍

A tuple, representing the shape of the data manager. This can only
be used in the case of `data=None`, and will render the data manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use double-ticks, see here

i.e., ``data=None``

dataless.

"""
if (shape is not None) and (data is not None):
msg = "`shape` should only be provided if `data is None`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = "`shape` should only be provided if `data is None`"
msg = '"shape" should only be provided if "data" is None'

raise ValueError(msg)

# Initialise the instance.
self._lazy_array = None
self._real_array = None

# Assign the data payload to be managed.
self._shape = shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO The comment on line+45 applies to self.data = data on line+47.

Could you move self._shape = shape to the above # Initialise the instance. block 👍

self.data = data

# Enforce the manager contract.
self._assert_axioms()
bjlittle marked this conversation as resolved.
Show resolved Hide resolved

def __copy__(self):
"""Forbid :class:`~iris._data_manager.DataManager` instance shallow-copy support."""
name = type(self).__name__
Expand Down Expand Up @@ -126,11 +136,16 @@ def __repr__(self):
def _assert_axioms(self):
"""Definition of the manager state, that should never be violated."""
# Ensure there is a valid data state.
is_lazy = self._lazy_array is not None
is_real = self._real_array is not None
emsg = "Unexpected data state, got {}lazy and {}real data."
state = is_lazy ^ is_real
assert state, emsg.format("" if is_lazy else "no ", "" if is_real else "no ")
empty = self._lazy_array is None and self._real_array is None
overfilled = self._lazy_array is not None and self._real_array is not None
if overfilled:
msg = "Unexpected data state, got both lazy and real data."
raise ValueError(msg)
elif (
empty and self._shape is None
): # if I remove the second check, allows empty arrays, like old behaviour
msg = "Unexpected data state, got no lazy or real data, and no shape."
raise ValueError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
empty = self._lazy_array is None and self._real_array is None
overfilled = self._lazy_array is not None and self._real_array is not None
if overfilled:
msg = "Unexpected data state, got both lazy and real data."
raise ValueError(msg)
elif (
empty and self._shape is None
): # if I remove the second check, allows empty arrays, like old behaviour
msg = "Unexpected data state, got no lazy or real data, and no shape."
raise ValueError(msg)
is_lazy = self._lazy_array is not None
is_real = self._real_array is not None
if not (is_lazy ^ is_real):
if is_lazy and is_real:
msg = "Unexpected data state, got both lazy and real data."
raise ValueError(msg)
if self.is_dataless() and not (is_lazy or is_real):
msg = "Unexpected data state, got no lazy or real data, and no shape."
raise ValueError(msg)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so glad of this correction, I couldn't get my head around the correct way to do this and was pulling out what precious hair I have left.


def _deepcopy(self, memo, data=None):
"""Perform a deepcopy of the :class:`~iris._data_manager.DataManager` instance.
Expand All @@ -148,25 +163,30 @@ def _deepcopy(self, memo, data=None):
:class:`~iris._data_manager.DataManager` instance.

"""
shape = None
try:
if data is None:
# Copy the managed data.
if self.has_lazy_data():
data = copy.deepcopy(self._lazy_array, memo)
else:
elif self._real_array is not None:
data = self._real_array.copy()
else:
shape = self.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
shape = self.shape
shape = self._shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll come back to this one

elif type(data) is str and data == iris.DATALESS:
shape = self.shape
data = None
else:
# Check that the replacement data is valid relative to
# the currently managed data.
dm_check = DataManager(self.core_data())
dm_check.data = data
# If the replacement data is valid, then use it but
# without copying it.
result = DataManager(data)
result = DataManager(data=data, shape=shape)
except ValueError as error:
emsg = "Cannot copy {!r} - {}"
raise ValueError(emsg.format(type(self).__name__, error))

return result

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Require to update the data getter doc-string to state that None will be returned in the dataless case.

As it happens, you're getting that behaviour for free i.e., when we're dataless then self._real_array will be None.

Expand Down Expand Up @@ -219,15 +239,20 @@ def data(self, data):
managed.

"""
# If data is None, ensure previous shape is maintained, and that it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Require to update the doc-string for the dataless None case.

# not wrapped in an np.array
dataless = data is None
if dataless:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataless = data is None
if dataless:
if (dataless := data is None):

self._shape = self.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Don't we require to set self._real_array = None and self._lazy_array = None then check the axiom only in the dataless case?

i.e., do that and nothing further.


# Ensure we have numpy-like data.
if not (hasattr(data, "shape") and hasattr(data, "dtype")):
elif not (hasattr(data, "shape") and hasattr(data, "dtype")):
data = np.asanyarray(data)

# Determine whether the class instance has been created,
# as this method is called from within the __init__.
init_done = self._lazy_array is not None or self._real_array is not None

if init_done and self.shape != data.shape:
# Determine whether the class already has a defined shape,
# as this method is called from __init__.
has_shape = self.shape is not None
if has_shape and not dataless and self.shape != data.shape:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO In the dataless case, we shouldn't get here.

# The _ONLY_ data reshape permitted is converting a 0-dimensional
# array i.e. self.shape == () into a 1-dimensional array of length
# one i.e. data.shape == (1,)
Expand All @@ -242,7 +267,8 @@ def data(self, data):
else:
if not ma.isMaskedArray(data):
# Coerce input data to ndarray (including ndarray subclasses).
data = np.asarray(data)
if not dataless:
data = np.asarray(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO We shouldn't need this defensive code for the dataless case i.e., we shouldn't get here.

if isinstance(data, ma.core.MaskedConstant):
# Promote to a masked array so that the fill-value is
# writeable to the data owner.
Expand All @@ -261,12 +287,19 @@ def dtype(self):
@property
def ndim(self):
"""The number of dimensions covered by the data being managed."""
return self.core_data().ndim
return len(self.shape)

@property
def shape(self):
"""The shape of the data being managed."""
return self.core_data().shape
if self.core_data() is None:
result = self._shape
else:
result = self.core_data().shape
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.core_data() is None:
result = self._shape
else:
result = self.core_data().shape
return result
return self._shape if self._shape else self.core_data().shape


def is_dataless(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_dataless(self):
def is_dataless(self) -> bool:

return (self.core_data() is None) and (self.shape is not None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Given our axiom (and if this isn't true then something is wrong) it must always be the case that:

  • self._shape = None and (self._lazy_array is not None or self._real_array is not None)
  • self._shape is not None and (self._lazy_array is None and self._real_array is None)

Therefore, is_dataless should be defined simply as return self._shape is not None, right?


def copy(self, data=None):
"""Return a deep copy of this :class:`~iris._data_manager.DataManager` instance.
Expand Down
43 changes: 40 additions & 3 deletions lib/iris/cube.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,8 @@ def _assert_is_cube(obj):
if not hasattr(obj, "add_aux_coord"):
msg = r"Object {obj} cannot be put in a cubelist, as it is not a Cube."
raise ValueError(msg)
elif obj.is_dataless():
raise iris.exceptions.DatalessError("CubeList")

def _repr_html_(self):
from iris.experimental.representation import CubeListRepresentation
Expand Down Expand Up @@ -1190,7 +1192,7 @@ def _walk_nodes(node):

def __init__(
self,
data: np.typing.ArrayLike,
data: np.typing.ArrayLike | None = None,
standard_name: str | None = None,
long_name: str | None = None,
var_name: str | None = None,
Expand All @@ -1204,6 +1206,7 @@ def __init__(
cell_measures_and_dims: Iterable[tuple[CellMeasure, int]] | None = None,
ancillary_variables_and_dims: Iterable[tuple[AncillaryVariable, int]]
| None = None,
shape: tuple | None = None,
):
"""Create a cube with data and optional metadata.

Expand Down Expand Up @@ -1250,6 +1253,9 @@ def __init__(
A list of CellMeasures with dimension mappings.
ancillary_variables_and_dims :
A list of AncillaryVariables with dimension mappings.
shape :
An alternative to providing data, this defines the shape of the
cube, but initialises the cube as dataless.

Examples
--------
Expand All @@ -1276,7 +1282,7 @@ def __init__(
self._metadata_manager = metadata_manager_factory(CubeMetadata)

# Initialise the cube data manager.
self._data_manager = DataManager(data)
self._data_manager = DataManager(data, shape)

#: The "standard name" for the Cube's phenomenon.
self.standard_name = standard_name
Expand Down Expand Up @@ -1475,6 +1481,8 @@ def convert_units(self, unit: str | Unit) -> None:

"""
# If the cube has units convert the data.
if self.is_dataless():
raise iris.exceptions.DatalessError("convert_units")
if self.units.is_unknown():
raise iris.exceptions.UnitConversionError(
"Cannot convert from unknown units. "
Expand Down Expand Up @@ -2883,6 +2891,16 @@ def has_lazy_data(self) -> bool:
"""
return self._data_manager.has_lazy_data()

def is_dataless(self) -> bool:
"""Detail whether this :class:`~iris.cube.Cube` is dataless.

Returns
-------
bool

"""
return self._data_manager.is_dataless()

@property
def dim_coords(self) -> tuple[DimCoord, ...]:
"""Return a tuple of all the dimension coordinates, ordered by dimension.
Expand Down Expand Up @@ -3091,6 +3109,8 @@ def subset(self, coord: AuxCoord | DimCoord) -> Cube | None:
whole cube is returned. As such, the operation is not strict.

"""
if self.is_dataless():
raise iris.exceptions.DatalessError("subset")
if not isinstance(coord, iris.coords.Coord):
raise ValueError("coord_to_extract must be a valid Coord.")

Expand Down Expand Up @@ -3212,6 +3232,8 @@ def intersection(self, *args, **kwargs) -> Cube:
which intersects with the requested coordinate intervals.

"""
if self.is_dataless():
raise iris.exceptions.DatalessError("intersection")
result = self
ignore_bounds = kwargs.pop("ignore_bounds", False)
threshold = kwargs.pop("threshold", 0)
Expand Down Expand Up @@ -3736,6 +3758,9 @@ def slices(
dimension index.

""" # noqa: D214, D406, D407, D410, D411
if self.is_dataless():
raise iris.exceptions.DatalessError("slices")

if not isinstance(ordered, bool):
raise TypeError("'ordered' argument to slices must be boolean.")

Expand Down Expand Up @@ -3823,7 +3848,8 @@ def transpose(self, new_order: list[int] | None = None) -> None:

# Transpose the data payload.
dm = self._data_manager
data = dm.core_data().transpose(new_order)
if not self.is_dataless():
data = dm.core_data().transpose(new_order)
self._data_manager = DataManager(data)

dim_mapping = {src: dest for dest, src in enumerate(new_order)}
Expand Down Expand Up @@ -4083,6 +4109,7 @@ def _deepcopy(self, memo, data=None):
aux_coords_and_dims=new_aux_coords_and_dims,
cell_measures_and_dims=new_cell_measures_and_dims,
ancillary_variables_and_dims=new_ancillary_variables_and_dims,
shape=(dm.shape if dm.core_data() is None else None),
)

new_cube.metadata = deepcopy(self.metadata, memo)
Expand Down Expand Up @@ -4310,6 +4337,8 @@ def collapsed(
cube.collapsed(['latitude', 'longitude'],
iris.analysis.VARIANCE)
"""
if self.is_dataless():
raise iris.exceptions.DatalessError("collapsed")
# Update weights kwargs (if necessary) to handle different types of
# weights
weights_info = None
Expand Down Expand Up @@ -4530,6 +4559,8 @@ def aggregated_by(
STASH m01s00i024

"""
if self.is_dataless():
raise iris.exceptions.DatalessError("aggregated_by")
# Update weights kwargs (if necessary) to handle different types of
# weights
weights_info = None
Expand Down Expand Up @@ -4829,6 +4860,8 @@ def rolling_window(
""" # noqa: D214, D406, D407, D410, D411
# Update weights kwargs (if necessary) to handle different types of
# weights
if self.is_dataless():
raise iris.exceptions.DatalessError("rolling_window")
weights_info = None
if kwargs.get("weights") is not None:
weights_info = _Weights(kwargs["weights"], self)
Expand Down Expand Up @@ -5034,6 +5067,8 @@ def interpolate(
True

"""
if self.is_dataless():
raise iris.exceptions.DatalessError("interoplate")
coords, points = zip(*sample_points)
interp = scheme.interpolator(self, coords) # type: ignore[arg-type]
return interp(points, collapse_scalar=collapse_scalar)
Expand Down Expand Up @@ -5079,6 +5114,8 @@ def regrid(self, grid: Cube, scheme: iris.analysis.RegriddingScheme) -> Cube:
this function is not applicable.

"""
if self.is_dataless():
raise iris.exceptions.DatalessError("regrid")
regridder = scheme.regridder(self, grid)
return regridder(self)

Expand Down
11 changes: 11 additions & 0 deletions lib/iris/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,3 +161,14 @@ class CannotAddError(ValueError):
"""Raised when an object (e.g. coord) cannot be added to a :class:`~iris.cube.Cube`."""

pass


class DatalessError(ValueError):
"""Raised when an method cannot be performed on a dataless :class:`~iris.cube.Cube`."""

def __str__(self):
msg = (
"Dataless cubes are still early in implementation, and dataless {} "
"operations are not currently supported."
)
return msg.format(super().__str__())
Loading
Loading