Rationalise CML Testing #6244

trexfeathers · 2024-12-09T11:51:23Z

assert_CML() and assert_CML_approx_data() are great conveniences for capturing Known Good Output - "we are happy with the way this Cube looks right now, and don't want it to change".

But there are some problems to address:

Why do we have both functions? When is one more appropriate than the other? All developers should have clarity on this, I guess via docstrings. Is there any reason we can't just use approx_data for everything, which would reduce our complexity?
(I believe the current mix is not by design, but mostly due to us copying patterns from existing tests)
Can the routine be changed or replaced with something that isn't so vulnerable to changes in NumPy's array printing? We occasionally have to create large pull requests that simply 'reset' the Known Good Output to align with the latest behaviour - these are a waste of developer time and leave large marks on the commit history. Past examples:
- Replacing numpy legacy printing with array2string and remaking result… #5235
- Demo Numpy v2 #6035

The text was updated successfully, but these errors were encountered:

trexfeathers · 2024-12-11T10:50:37Z

Hashing arrays may be a solution, so long as the hashing is tolerant of minor changes.

trexfeathers · 2024-12-11T15:55:11Z

From discussion with @HGWright and @ESadek-MO and @trexfeathers: string representation of NumPy arrays is not expected to be stable - there is no 'contract' for this, so long as a human can read the string then it's fine. That is not a sensible basis for consistent testing. The .npy file format IS designed to be stable and backward compatible - the array is stored in its native binary format.

So: we could keep the CML format, but each CML file should be stored in a directory, together with .npy files for each NumPy array in the Cube. No arrays are stored in the CML file, only references to the relevant .npy files. We would need to write a basic CML parser for this to work.

As well as removing vulnerability to breaking changes in string format, storing NumPy arrays natively would allow use of tests._shared_utils.assert_array_almost_equal(), which offers tolerance options, and further standardises our testing.

pp-mo · 2024-12-12T10:48:00Z

So: we could keep the CML format, but each CML file should be stored in a directory, together with .npy files for each NumPy array in the Cube. No arrays are stored in the CML file, only references to the relevant .npy files. We would need to write a basic CML parser for this to work.

TBH I don't think this preserves the basic utility of what CML offers.
I think what we really want is a human-readable output that enables you to see what changed when something changed, but allows some floating-point equality tolerance.
So I'd prefer a programmed summary of floating-point content, more like what we now have in cube printouts. Unfortunately, I do have to admit that that still relies on numpy formatting, but I think to be fair this one-line summary approach has proved much more stable than the solutions we had previously.
For choice, I would record + compare the first + last 'N' points (? N=3 ?), and the sum, that would probably be good enough.

pp-mo · 2024-12-12T10:52:41Z

TBH I don't think this preserves the basic utility of what CML offers. I think what we ...
I would record + compare the first + last 'N' points (? N=3 ?), and the sum

I realise I'm not sure if we're comparing really big arrays in this way, or is it "just" coordinates etc. Or maybe we just avoid anything too big. In which case all points might make good sense,

trexfeathers · 2024-12-12T16:09:52Z

@pp-mo to provide tolerance while still providing human readability, I think we would need to write our own parser, perhaps based on numpy.fromstring() but supporting N dimensions. To provide the stability we need, we would also need to write our own string representation so that we are not vulnerable to changes in NumPy, again supporting N dimensions.

I'm sceptical on whether this would be worth the extra effort. In the graphics tests we have a well established precedent for tests that need 5mins extra work to see what has changed, and that workflow is easy enough to cope with. It would be similarly easy to work out changes in NumPy arrays by running the offending test locally.

I'd also argue that our current printing of arrays in CML files actually harms readability in some cases. Once an array gets beyond 2 dimensions it is tricky to understand the differences I am looking at. Maybe I need more practice.

trexfeathers added the Type: Testing label Dec 9, 2024

scitools-ci bot added this to 🚴 Peloton Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationalise CML Testing #6244

Rationalise CML Testing #6244

trexfeathers commented Dec 9, 2024

trexfeathers commented Dec 11, 2024

trexfeathers commented Dec 11, 2024

pp-mo commented Dec 12, 2024 •

edited

Loading

pp-mo commented Dec 12, 2024

trexfeathers commented Dec 12, 2024

Rationalise CML Testing #6244

Rationalise CML Testing #6244

Comments

trexfeathers commented Dec 9, 2024

trexfeathers commented Dec 11, 2024

trexfeathers commented Dec 11, 2024

pp-mo commented Dec 12, 2024 • edited Loading

pp-mo commented Dec 12, 2024

trexfeathers commented Dec 12, 2024

pp-mo commented Dec 12, 2024 •

edited

Loading