Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dataless cubes #6253

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
96a3f2d
Init commit, part way through ensuring Nones aren't wrapped in np.array
ESadek-MO Dec 11, 2024
a2e0da9
Merge branch 'main' into dataless
ESadek-MO Dec 11, 2024
7114f0f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
84ea908
None types are no longer wrapped
ESadek-MO Dec 11, 2024
44afb25
merge conflicts
ESadek-MO Dec 11, 2024
cdc51d5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
7b402e0
clarified axiom check
ESadek-MO Dec 11, 2024
bc1ee6f
moved shape order in Cube
ESadek-MO Dec 11, 2024
17e0b30
merge conflicts
ESadek-MO Dec 11, 2024
16779aa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
f42c19d
replace call to self.data with self.core_data()
ESadek-MO Dec 11, 2024
b4ffc31
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 11, 2024
6d62c7d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
cc13e6d
fixed test regex
ESadek-MO Dec 11, 2024
71c7ae8
precommit
ESadek-MO Dec 11, 2024
e59d5c9
written tests, and refactored redundant checks
ESadek-MO Dec 12, 2024
95c09e1
refactored tests
ESadek-MO Dec 12, 2024
b2fb2f8
removed shape asserts within data tests
ESadek-MO Dec 12, 2024
c6510a5
copy cube now has FUTURE behaviour
ESadek-MO Dec 12, 2024
af7d727
pre-commit
ESadek-MO Dec 12, 2024
144e164
written tests for cube.copy
ESadek-MO Dec 16, 2024
ba84553
fixed cbe.copy failure in dim coords
ESadek-MO Dec 16, 2024
ea3c150
experimenting with 4d cube with everything; doesn't run locally
ESadek-MO Dec 16, 2024
0cec4ca
pre-c
ESadek-MO Dec 16, 2024
6ed270d
edited Coord.copy
ESadek-MO Dec 16, 2024
d79ad27
Revert "edited Coord.copy"
ESadek-MO Dec 16, 2024
2bb6e61
tried tearingdown FUTUREFLAG
ESadek-MO Dec 16, 2024
063e45b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 16, 2024
79d7d67
made dataless copy opt-in behaviour
ESadek-MO Dec 17, 2024
9b52067
cube data is optional
ESadek-MO Dec 17, 2024
2f2b56a
added is_dataless, and corrected data manager exceptions to not be cu…
ESadek-MO Dec 17, 2024
c8d2c07
pre-c
ESadek-MO Dec 17, 2024
67fb787
fixed failing tests
ESadek-MO Dec 17, 2024
379b273
Merge branch 'main' into dataless
ESadek-MO Dec 17, 2024
a042451
fixed copying from dataless
ESadek-MO Dec 17, 2024
088bb94
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 17, 2024
21489d5
added in exceptions, not tested
ESadek-MO Dec 18, 2024
a059c5e
renamed DATALESS_COPY to DATALESS, and added comment
ESadek-MO Dec 18, 2024
19e956d
fixed merge conflicts
ESadek-MO Dec 18, 2024
8178809
made DATALESS docstring a docstring, rather than a comment
ESadek-MO Dec 18, 2024
2123b8f
added whatsnew
ESadek-MO Dec 18, 2024
703bbb6
added cubelist errors
ESadek-MO Dec 18, 2024
33649c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 18, 2024
021bd31
review comments
ESadek-MO Dec 18, 2024
f1bfe7b
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 18, 2024
abb5a57
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 18, 2024
94d02ea
fixed broken review suggestion
ESadek-MO Dec 18, 2024
5be6a1f
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 18, 2024
5c5bbf8
fixed some problems, and ensured self._shape is always set
ESadek-MO Dec 20, 2024
de348f9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 20, 2024
7c74460
fixed __eq__ and __repr__, and maybe dtype of dataless equal None
ESadek-MO Dec 20, 2024
0ceed69
Merge branch 'dataless' of github.com:ESadek-MO/iris into dataless
ESadek-MO Dec 20, 2024
8bddec7
included self.core_data().shape when shape is none, for Coord cases
ESadek-MO Dec 23, 2024
2a67050
made sure _shape is set in case of prexisting _shape of ()
ESadek-MO Dec 24, 2024
f8c2dad
added eq and repr tests, and fixed shape of () not being valid
ESadek-MO Dec 24, 2024
ed75485
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 24, 2024
9221917
fixed test now that self._shape is always set
ESadek-MO Dec 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/src/whatsnew/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,17 @@ This document explains the changes made to Iris for this release
your code for new floating point problems if activating this (e.g. when
using the :class:`~iris.Constraint` API). (:pull:`6260`)

#. `@ESadek-MO`_ made :attr:`~iris.cube.Cube.data` optional in a
:class:`~iris.cube.Cube`, when :attr:`~iris.cube.Cube.shape` is provided
instead. `dataless cubes` can currently be used as targets in regridding, or
for templates to add data to at a later time.

This is the first step in making `dataless cubes`. Currently, most cube methods
don't work on `dataless cubes`, and will raise in an error if attempted.
:meth:`~iris.cube.Cube.transpose` will work, as will :meth:`~iris.cube.Cube.copy`.
`my_cube.copy(data = iris.DATALESS)` will copy the cube and remove data in
the process.
(:issue:`4447`, :pull:`6253`)

🐛 Bugs Fixed
=============
Expand Down
4 changes: 4 additions & 0 deletions lib/iris/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ def callback(cube, field, filename):
__all__ = [
"AttributeConstraint",
"Constraint",
"DATALESS",
"FUTURE",
"Future",
"IrisDeprecation",
Expand All @@ -139,6 +140,9 @@ def callback(cube, field, filename):
AttributeConstraint = iris._constraints.AttributeConstraint
NameConstraint = iris._constraints.NameConstraint

#: To be used when copying a cube to make the new cube dataless.
DATALESS = "NONE"


class Future(threading.local):
"""Run-time configuration controller."""
Expand Down
6 changes: 5 additions & 1 deletion lib/iris/_concatenate.py
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,11 @@ def concatenate(
A :class:`iris.cube.CubeList` of concatenated :class:`iris.cube.Cube` instances.

"""
cube_signatures = [_CubeSignature(cube) for cube in cubes]
cube_signatures = []
for cube in cubes:
if cube.is_dataless():
raise iris.exceptions.DatalessError("concatenate")
cube_signatures.append(_CubeSignature(cube))

proto_cubes: list[_ProtoCube] = []
# Initialise the nominated axis (dimension) of concatenation
Expand Down
152 changes: 99 additions & 53 deletions lib/iris/_data_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,43 @@
import numpy.ma as ma

from iris._lazy_data import as_concrete_data, as_lazy_data, is_lazy_data
import iris.exceptions
import iris.warnings


class DataManager:
"""Provides a well defined API for management of real or lazy data."""

def __init__(self, data):
def __init__(self, data, shape=None):
"""Create a data manager for the specified data.

Parameters
----------
data :
data : np.typing.ArrayLike, optional
The :class:`~numpy.ndarray` or :class:`~numpy.ma.core.MaskedArray`
real data, or :class:`~dask.array.core.Array` lazy data to be
managed.
managed. If a value of None is given, the data manager will be
considered dataless.

shape : tuple, optional
A tuple, representing the shape of the data manager. This can only
be used in the case of ``data=None``, and will render the data manager
dataless.

"""
if (shape is not None) and (data is not None):
msg = '"shape" should only be provided if "data" is None'
raise ValueError(msg)

self._shape = shape

# Initialise the instance.
self._lazy_array = None
self._real_array = None

# Assign the data payload to be managed.
self.data = data

# Enforce the manager contract.
self._assert_axioms()
bjlittle marked this conversation as resolved.
Show resolved Hide resolved

def __copy__(self):
"""Forbid :class:`~iris._data_manager.DataManager` instance shallow-copy support."""
name = type(self).__name__
Expand Down Expand Up @@ -83,12 +94,14 @@ def __eq__(self, other):
result = NotImplemented

if isinstance(other, type(self)):
result = False
same_lazy = self.has_lazy_data() == other.has_lazy_data()
same_dtype = self.dtype == other.dtype
if same_lazy and same_dtype:
result = array_equal(self.core_data(), other.core_data())

if self.is_dataless() and other.is_dataless():
result = self.shape == other.shape
else:
result = False
same_lazy = self.has_lazy_data() == other.has_lazy_data()
same_dtype = self.dtype == other.dtype
if same_lazy and same_dtype:
result = array_equal(self.core_data(), other.core_data())
return result

def __ne__(self, other):
Expand Down Expand Up @@ -120,6 +133,8 @@ def __repr__(self):
"""Return an string representation of the instance."""
fmt = "{cls}({data!r})"
result = fmt.format(data=self.core_data(), cls=type(self).__name__)
if self.is_dataless():
result = f"{result}, shape={self.shape}"

return result

Expand All @@ -128,9 +143,14 @@ def _assert_axioms(self):
# Ensure there is a valid data state.
is_lazy = self._lazy_array is not None
is_real = self._real_array is not None
emsg = "Unexpected data state, got {}lazy and {}real data."
state = is_lazy ^ is_real
assert state, emsg.format("" if is_lazy else "no ", "" if is_real else "no ")

if not (is_lazy ^ is_real):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer as if is_lazy == is_real.

if is_lazy and is_real:
msg = "Unexpected data state, got both lazy and real data."
raise ValueError(msg)
elif self._shape is None:
msg = "Unexpected data state, got no lazy or real data, and no shape."
raise ValueError(msg)

def _deepcopy(self, memo, data=None):
"""Perform a deepcopy of the :class:`~iris._data_manager.DataManager` instance.
Expand All @@ -148,25 +168,30 @@ def _deepcopy(self, memo, data=None):
:class:`~iris._data_manager.DataManager` instance.

"""
shape = None
try:
if data is None:
# Copy the managed data.
if self.has_lazy_data():
data = copy.deepcopy(self._lazy_array, memo)
else:
elif self._real_array is not None:
data = self._real_array.copy()
else:
shape = self._shape
elif type(data) is str and data == iris.DATALESS:
shape = self.shape
data = None
else:
# Check that the replacement data is valid relative to
# the currently managed data.
dm_check = DataManager(self.core_data())
dm_check.data = data
# If the replacement data is valid, then use it but
# without copying it.
result = DataManager(data)
result = DataManager(data=data, shape=shape)
except ValueError as error:
emsg = "Cannot copy {!r} - {}"
raise ValueError(emsg.format(type(self).__name__, error))

return result

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Require to update the data getter doc-string to state that None will be returned in the dataless case.

As it happens, you're getting that behaviour for free i.e., when we're dataless then self._real_array will be None.

Expand All @@ -175,7 +200,7 @@ def data(self):

Returns
-------
:class:`~numpy.ndarray` or :class:`numpy.ma.core.MaskedArray`.
:class:`~numpy.ndarray` or :class:`numpy.ma.core.MaskedArray` or None.

"""
if self.has_lazy_data():
Expand Down Expand Up @@ -216,57 +241,76 @@ def data(self, data):
data :
The :class:`~numpy.ndarray` or :class:`~numpy.ma.core.MaskedArray`
real data, or :class:`~dask.array.core.Array` lazy data to be
managed.
managed. If data is None, the current shape will be maintained.

"""
# Ensure we have numpy-like data.
if not (hasattr(data, "shape") and hasattr(data, "dtype")):
data = np.asanyarray(data)

# Determine whether the class instance has been created,
# as this method is called from within the __init__.
init_done = self._lazy_array is not None or self._real_array is not None

if init_done and self.shape != data.shape:
# The _ONLY_ data reshape permitted is converting a 0-dimensional
# array i.e. self.shape == () into a 1-dimensional array of length
# one i.e. data.shape == (1,)
if self.shape or data.shape != (1,):
emsg = "Require data with shape {!r}, got {!r}."
raise ValueError(emsg.format(self.shape, data.shape))

# Set lazy or real data, and reset the other.
if is_lazy_data(data):
self._lazy_array = data
if data is None:
self._shape = self.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Don't we require to set self._real_array = None and self._lazy_array = None then check the axiom only in the dataless case?

i.e., do that and nothing further.

self._lazy_array = None
self._real_array = None

# Ensure we have numpy-like data.
else:
if not ma.isMaskedArray(data):
# Coerce input data to ndarray (including ndarray subclasses).
data = np.asarray(data)
if isinstance(data, ma.core.MaskedConstant):
# Promote to a masked array so that the fill-value is
# writeable to the data owner.
data = ma.array(data.data, mask=data.mask, dtype=data.dtype)
self._lazy_array = None
self._real_array = data
if not (hasattr(data, "shape") and hasattr(data, "dtype")):
data = np.asanyarray(data)

# Determine whether the class already has a defined shape,
# as this method is called from __init__.
has_shape = self._shape is not None
if has_shape and self.shape != data.shape:
# The _ONLY_ data reshape permitted is converting a 0-dimensional
# array i.e. self.shape == () into a 1-dimensional array of length
# one i.e. data.shape == (1,)
if self.shape or data.shape != (1,):
emsg = "Require data with shape {!r}, got {!r}."
raise ValueError(emsg.format(self.shape, data.shape))

# Set lazy or real data, and reset the other.
if is_lazy_data(data):
self._lazy_array = data
self._real_array = None
else:
if not ma.isMaskedArray(data):
# Coerce input data to ndarray (including ndarray subclasses).
data = np.asarray(data)
if isinstance(data, ma.core.MaskedConstant):
# Promote to a masked array so that the fill-value is
# writeable to the data owner.
data = ma.array(data.data, mask=data.mask, dtype=data.dtype)
self._lazy_array = None
self._real_array = data
# sets ``self._shape`` if it is None, or if it is being converted from
# ( ) to (1, )
if not has_shape or (self._shape == () and data.shape == (1,)):
self._shape = self.core_data().shape

# Check the manager contract, as the managed data has changed.
# Check the manager contract, as the managed data has changed.
self._assert_axioms()

@property
def dtype(self):
"""The dtype of the realised lazy data or the dtype of the real data."""
return self.core_data().dtype
return self.core_data().dtype if not self.is_dataless() else None

@property
def ndim(self):
"""The number of dimensions covered by the data being managed."""
return self.core_data().ndim
return len(self.shape)

@property
def shape(self):
"""The shape of the data being managed."""
return self.core_data().shape
return self._shape

def is_dataless(self) -> bool:
"""Determine whether the cube has no data.

Returns
-------
bool

"""
return self.core_data() is None

def copy(self, data=None):
"""Return a deep copy of this :class:`~iris._data_manager.DataManager` instance.
Expand Down Expand Up @@ -327,7 +371,9 @@ def lazy_data(self):
This method will never realise any lazy data.

"""
if self.has_lazy_data():
if self.is_dataless():
result = None
elif self.has_lazy_data():
result = self._lazy_array
else:
result = as_lazy_data(self._real_array)
Expand Down
7 changes: 6 additions & 1 deletion lib/iris/_lazy_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
import numpy as np
import numpy.ma as ma

import iris.exceptions


def non_lazy(func):
"""Turn a lazy function into a function that returns a result immediately."""
Expand Down Expand Up @@ -317,7 +319,10 @@ def _co_realise_lazy_arrays(arrays):
# Note : in some cases dask (and numpy) will return a scalar
# numpy.int/numpy.float object rather than an ndarray.
# Recorded in https://github.com/dask/dask/issues/2111.
real_out = np.asanyarray(real_out)
if real_out is not None:
real_out = np.asanyarray(real_out)
else:
raise iris.exceptions.DatalessError("realising")
if isinstance(real_out, ma.core.MaskedConstant):
# Convert any masked constants into NumPy masked arrays.
# NOTE: in this case, also apply the original lazy-array dtype, as
Expand Down
4 changes: 4 additions & 0 deletions lib/iris/_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1109,6 +1109,8 @@ def __init__(self, cube):
source-cube.

"""
if cube.is_dataless():
raise iris.exceptions.DatalessError("merge")
# Default hint ordering for candidate dimension coordinates.
self._hints = [
"time",
Expand Down Expand Up @@ -1289,6 +1291,8 @@ def register(self, cube, error_on_mismatch=False):
this :class:`ProtoCube`.

"""
if cube.is_dataless():
raise iris.exceptions.DatalessError("merge")
cube_signature = self._cube_signature
other = self._build_signature(cube)
match = cube_signature.match(other, error_on_mismatch)
Expand Down
Loading
Loading