Breaking change of `dataclasses.dataclass` comparison semantics in 3.13+ #128294

daskol · 2024-12-27T15:21:05Z

Bug report

Bug description:

Brief Description

Optimization done in #109870 changed semantic of dataclass comparison.

Description

The pseudo code below shows meaning of changes have been done in #109870. The semantic differs largely because of a shortcut in __eq__ implementation of sequence-like containers (see Objects/object.c). The shortcut essentially does self[i] is other[i]. Consequently, method __eq__ of self[i] is not evaluated for identical objects in 3.12 during dataclasses.dataclass comparison.

def __eq__312(self, other):
    return astuple(self) == astuple(other)

def __eq__313(self, other):
    for lhs, rhs in zip(astuple(self), astuple(other)):
        if not lhs == rhs:
            return False
    else:
        return True

According Python docs (citation below), v3.13 introduces breaking change since it does not consider fields as a tuples for dataclass comparison.

eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type.

Test Case

import numpy as np
from dataclasses import dataclass

@dataclass
class A:
    xs: np.ndarray

a = A(np.ones(3))
b = A(a.xs)

print(a == b)  # FAIL (3.13); OK (3.12).
# ValueError: Value The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

CPython versions tested on:

3.13 (3.12)

Operating systems tested on:

Linux

The text was updated successfully, but these errors were encountered:

sobolevn · 2024-12-27T15:45:00Z

In 3.12 we used to do something like: return (self.x, self.y) == (other.x, other.y), see

cpython/Lib/dataclasses.py

Lines 1085 to 1094 in 3a726be

    
           if eq: 
        
               # Create __eq__ method.  There's no need for a __ne__ method, 
        
               # since python will call __eq__ and negate it. 
        
               flds = [f for f in field_list if f.compare] 
        
               self_tuple = _tuple_str('self', flds) 
        
               other_tuple = _tuple_str('other', flds) 
        
               _set_new_attribute(cls, '__eq__', 
        
                                  _cmp_fn('__eq__', '==', 
        
                                          self_tuple, other_tuple, 
        
                                          globals=globals))

Now, instead we do: self.x == other.x and self.y == other.y, which triggers explicit __eq__.

Problems:

This is a breaking change
Reverting this would be a breaking change for 3.13 users 😢

daskol · 2024-12-27T15:53:18Z

Indeed, we have the problem of two chairs here. From my perspective, the right decision would be reverting as soon as possible until major Linux distros do not adopt 3.13 widely.

In advance, I could give some context on a domain where the issue araises first. It is machine learning where data classes are used a lot for neural network representation and maintaining weight collections. Comparison is broken there because overridden __eq__ has vector rather scalar semantic.

picnixz · 2024-12-27T17:52:42Z

method eq of self[i] is not evaluated for identical objects in 3.12

Considering that we largely short-circuit equality in general, I think we should revert it. In addition, we are now out-of-sync with "This method compares the class as if it were a tuple of its fields, in order" (emphasis mine). While this can be a breaking change, we haven't updated the documentation, so maybe it hasn't been observed.

Most of the time, two objects that are identical (in terms of pointers) should also compare equal, independently of whether there is a custom __eq__ or not, though this is my own opinion. If we're worried about constructing the tuple, instead of doing x[i] == y[i], we could do x[i] is y[i] or x[i] == y[i] to emulate this behaviour but I wonder how performance would be affected.

cc @Yhg1s as the 3.13 RM.

sobolevn · 2024-12-27T17:59:48Z

I will send a fix ASAP.

ericvsmith · 2024-12-27T19:39:53Z

See also #120645. I don't think that we could back out the change at this point.

daskol · 2024-12-28T16:30:04Z

From perspective of PEP-557, data classes should be compared as tuples since it explicitly states that Data Classes can be thought of as “mutable namedtuples with defaults”. I guess that making data classes as close as possible to named tuples was a primary design goal. For this reason, breaking change inroduced by #120645 was not the wisest decision.

from collections import namedtuple
from dataclasses import dataclass

Point = namedtuple('Point', ['x', 'y'])
p1 = Point(0.0, float('nan'))
p2 = Point(0.0, p1.y)
assert p1 == p2  # OK


@dataclass
class Point:
    x: float
    y: float

p1 = Point(0.0, float('nan'))
p2 = Point(0.0, p1.y)
assert p1 == p2  # OK (3.12); FAIL (3.13).

daskol added the type-bug An unexpected behavior, bug, or error label Dec 27, 2024

Eclips4 added topic-dataclasses stdlib Python modules in the Lib dir labels Dec 27, 2024

daskol mentioned this issue Dec 27, 2024

Failed in the test process in Python 3.13 google-deepmind/chex#371

Open

daskol changed the title ~~Breaking change of dataclass.dataclass comparison semtatics~~ Breaking change of dataclasses.dataclass comparison semtatics Dec 27, 2024

picnixz changed the title ~~Breaking change of dataclasses.dataclass comparison semtatics~~ Breaking change of dataclasses.dataclass comparison semantics Dec 27, 2024

picnixz added 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Dec 27, 2024

picnixz changed the title ~~Breaking change of dataclasses.dataclass comparison semantics~~ Breaking change of dataclasses.dataclass comparison semantics in 3.13+ Dec 27, 2024

sobolevn self-assigned this Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking change of `dataclasses.dataclass` comparison semantics in 3.13+ #128294

Breaking change of `dataclasses.dataclass` comparison semantics in 3.13+ #128294

daskol commented Dec 27, 2024 •

edited

Loading

sobolevn commented Dec 27, 2024

daskol commented Dec 27, 2024

picnixz commented Dec 27, 2024

sobolevn commented Dec 27, 2024

ericvsmith commented Dec 27, 2024

daskol commented Dec 28, 2024 •

edited

Loading

Breaking change of dataclasses.dataclass comparison semantics in 3.13+ #128294

Breaking change of dataclasses.dataclass comparison semantics in 3.13+ #128294

Comments

daskol commented Dec 27, 2024 • edited Loading

Bug report

Bug description:

Brief Description

Description

Test Case

CPython versions tested on:

Operating systems tested on:

sobolevn commented Dec 27, 2024

daskol commented Dec 27, 2024

picnixz commented Dec 27, 2024

sobolevn commented Dec 27, 2024

ericvsmith commented Dec 27, 2024

daskol commented Dec 28, 2024 • edited Loading

Breaking change of `dataclasses.dataclass` comparison semantics in 3.13+ #128294

Breaking change of `dataclasses.dataclass` comparison semantics in 3.13+ #128294

daskol commented Dec 27, 2024 •

edited

Loading

daskol commented Dec 28, 2024 •

edited

Loading