Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalise CML Testing #6244

Open
trexfeathers opened this issue Dec 9, 2024 · 5 comments
Open

Rationalise CML Testing #6244

trexfeathers opened this issue Dec 9, 2024 · 5 comments

Comments

@trexfeathers
Copy link
Contributor

assert_CML() and assert_CML_approx_data() are great conveniences for capturing Known Good Output - "we are happy with the way this Cube looks right now, and don't want it to change".

But there are some problems to address:

  • Why do we have both functions? When is one more appropriate than the other? All developers should have clarity on this, I guess via docstrings. Is there any reason we can't just use approx_data for everything, which would reduce our complexity?
    (I believe the current mix is not by design, but mostly due to us copying patterns from existing tests)
  • Can the routine be changed or replaced with something that isn't so vulnerable to changes in NumPy's array printing? We occasionally have to create large pull requests that simply 'reset' the Known Good Output to align with the latest behaviour - these are a waste of developer time and leave large marks on the commit history. Past examples:
@trexfeathers
Copy link
Contributor Author

Hashing arrays may be a solution, so long as the hashing is tolerant of minor changes.

@trexfeathers
Copy link
Contributor Author

From discussion with @HGWright and @ESadek-MO and @trexfeathers: string representation of NumPy arrays is not expected to be stable - there is no 'contract' for this, so long as a human can read the string then it's fine. That is not a sensible basis for consistent testing. The .npy file format IS designed to be stable and backward compatible - the array is stored in its native binary format.

So: we could keep the CML format, but each CML file should be stored in a directory, together with .npy files for each NumPy array in the Cube. No arrays are stored in the CML file, only references to the relevant .npy files. We would need to write a basic CML parser for this to work.

As well as removing vulnerability to breaking changes in string format, storing NumPy arrays natively would allow use of tests._shared_utils.assert_array_almost_equal(), which offers tolerance options, and further standardises our testing.

@pp-mo
Copy link
Member

pp-mo commented Dec 12, 2024

So: we could keep the CML format, but each CML file should be stored in a directory, together with .npy files for each NumPy array in the Cube. No arrays are stored in the CML file, only references to the relevant .npy files. We would need to write a basic CML parser for this to work.

TBH I don't think this preserves the basic utility of what CML offers.
I think what we really want is a human-readable output that enables you to see what changed when something changed, but allows some floating-point equality tolerance.
So I'd prefer a programmed summary of floating-point content, more like what we now have in cube printouts. Unfortunately, I do have to admit that that still relies on numpy formatting, but I think to be fair this one-line summary approach has proved much more stable than the solutions we had previously.
For choice, I would record + compare the first + last 'N' points (? N=3 ?), and the sum, that would probably be good enough.

@pp-mo
Copy link
Member

pp-mo commented Dec 12, 2024

TBH I don't think this preserves the basic utility of what CML offers. I think what we ...
I would record + compare the first + last 'N' points (? N=3 ?), and the sum

I realise I'm not sure if we're comparing really big arrays in this way, or is it "just" coordinates etc. Or maybe we just avoid anything too big. In which case all points might make good sense,

@trexfeathers
Copy link
Contributor Author

@pp-mo to provide tolerance while still providing human readability, I think we would need to write our own parser, perhaps based on numpy.fromstring() but supporting N dimensions. To provide the stability we need, we would also need to write our own string representation so that we are not vulnerable to changes in NumPy, again supporting N dimensions.

I'm sceptical on whether this would be worth the extra effort. In the graphics tests we have a well established precedent for tests that need 5mins extra work to see what has changed, and that workflow is easy enough to cope with. It would be similarly easy to work out changes in NumPy arrays by running the offending test locally.

I'd also argue that our current printing of arrays in CML files actually harms readability in some cases. Once an array gets beyond 2 dimensions it is tricky to understand the differences I am looking at. Maybe I need more practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants