Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for diffing and patching data holder objects #159

Open
mikaelho opened this issue Jul 8, 2021 · 0 comments
Open

Support for diffing and patching data holder objects #159

mikaelho opened this issue Jul 8, 2021 · 0 comments

Comments

@mikaelho
Copy link
Contributor

mikaelho commented Jul 8, 2021

I would need to go beyond just dict/set/list data structures in my diffing, i.e. diffing
other objects as well. Main target for this would be to support data holder objects:
types.SimpleNamespace, dataclasses, pydantic models.

This would be an incremental change, not changing anything about how
diffing and patching works for dicts/lists/sets today.

If you want to jump into code, I have a first tested implementation available
here.

Key parts of the proposed change:

1. diff and patch objects

Treat objects as if they were the dict represented by their __dict__ value

first = SimpleNamespace(a=1)
second = SimpleNamespace(a=2)
delta = diff(first, second)
assert patch(delta, first).a == 2

Here I decided to include the __dict__ value in the path in the change:

first = SimpleNamespace(a=1)
second = SimpleNamespace(a=2)
changes = list(diff(first, second))
assert changes == [('change', ['__dict__', 'a'], (1, 2))]

Alternative would be to not include the special value, essentially treating the object
transparently as a dict, but I opted for being explicit here.

2. Represent objects in diffs

Today, an object like decimal.Decimal is included in the diff information as the object itself.
This does not work if we want to store the diffs in a file or a database, and get them back later.

Looking at two types of objects:

Specific Python objects

E.g. converting Decimal('1.23') to '1.23' when diffing and back to Decimal('1.23') when patching.

Here I implemented support for:

  • datetime.datetime
  • datetime.date
  • datetime.time
  • datetime.timedelta
  • decimal.Decimal
  • enum.Enum
  • pathlib.Path
  • uuid.UUID

(For now, enums are just converted to the value - Color.RED converted to 2, for example - but
not back.)

I added a method for a dev to add rules for other roundtrip transformations as required.

Generic objects

Converting an object with __dict__ to just the __dict__ value when diffing and back to the
original object when patching.

To handle the security concern with creating objects when your diff data might come from an
untrusted source, I am putting in an "allow list" which restricts the modules that the class be
inherited from.

"Serialized" format

Objects are included in diffs as dicts with a special key. For example:

first = []
second = [SimpleNamespace(a=1)]
changes = list(diff(first, second))
assert changes == [('add', '', [(0, {
    VALUE_KEY: {'module': 'types', 'name': 'SimpleNamespace', 'value': {'a': 1}}}
)])]

... where the VALUE_KEY is '_dictdiffer_value_key'.

My questions

  • Would the maintainers be willing to merge something like described here?
  • Any improvement ideas on the design?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant