Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible to diff lists of dicts? #131

Open
netllama opened this issue Nov 26, 2019 · 16 comments
Open

possible to diff lists of dicts? #131

netllama opened this issue Nov 26, 2019 · 16 comments

Comments

@netllama
Copy link

I'm trying to use dictdiffer to get the diff of two lists of dictionaries, but its not working well. It seems to get confused whenever the ordering of the dicts inside the list is not identical between the two lists, or when the number of dictionaries in one list is (slightly) different from the other list.

Is this functionality something that is currently possible and expected to work, or am I trying to accomplish something unsupported?

In case it matters, this is an example of a list of dictionaries that I'm trying to work with:

[{'prefix': '162.245.48.104/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.112/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.120/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.128/27',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.16/28',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.160/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.168/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.176/28',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.192/30',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.200/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.208/30',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.216/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.224/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.232/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.240/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.248/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.32/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.40/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.48/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.56/29',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.64/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.72/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.48.96/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.50.0/24',
  'effective_as_path_length': 1,
  'med': 6,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.112/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.120/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.128/28',
  'effective_as_path_length': 1,
  'med': 15,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']},
 {'prefix': '162.245.51.144/29',
  'effective_as_path_length': 1,
  'med': 0,
  'destination': ['abcd.jln001.norc'],
  'origin_asn': ['393467'],
  'next_hop_asn': ['393467']}]
@jirikuncar
Copy link
Member

Can you provide a smaller example with current and expected results?

There are some limitations to list comparison:

  • it can not detect changes in the order of elements
  • if an element was added in the middle of a list, then they will differ from that possition (remove + add)

@netllama
Copy link
Author

Thanks. The limitations that you noted are definitely present in the data that I'm working with. It that's expected behavior right now, that's fine. I'll need to use something other than dictdiffer for the project.

@mikaelho
Copy link
Contributor

I am confused by the statement: ”it can not detect changes in the order of elements”.

Since:
>>> list(dictdiffer.diff([1,2],[2,1]))
[('change', [0], (1, 2)), ('change', [1], (2, 1))]

Can you expand a bit?

@mikaelho
Copy link
Contributor

Now I explained it to myself. The ideal result, at least for some use cases, would be some kind of a ”swap” operation, instead of the two ”changes”.

@netllama
Copy link
Author

Yes, exactly. Sometimes, the order of dicts inside the list doesn't matter.

You should be able to reproduce my original issue by creating a second list of dicts (based on the one that I provided), and making a few minor changes:

  • Add another dict to the middle of the list with the same structure as the others dicts, with a unique value for the prefix key. Then diff the original and new lists. It should only report the addition of 1 new dict, but in my results, it reports multiple other changes erroneously.
  • Re-arrange the order of the dicts in the list, and diff the original and new lists. It shouldn't report any differences, but seems to get confused, and starts comparing each list dict element by index.

@mikaelho
Copy link
Contributor

I think your first issue is covered by @jirikuncar’s explanation of list diffs - everything is ”different” from the addition onward. Your second issue I think stems from the fact that dictdiffer does not compare dicts by identity, but by content - if you have shuffled the list, changing the first element, dictdiffer detects the difference in specific content within the dict, not in its identity.

If you find a better solution for your needs, please share your findings here as well.

@jirikuncar
Copy link
Member

jirikuncar commented Nov 28, 2019

I have been experimenting with storing JSON structures in Git to do diffs on big nested objects.

You could do something like:

$ pip install git_json_tree
$ ipython
import subprocess
import git_json_tree

repo = git_json_tree.Repo('/tmp/git_json_tree')
sha_original = git_json_tree.encode(repo, ["your example goes here"])
sha_modified = git_json_tree.encode(repo, ["your modified example goes here"])

subprocess.call(['git', 'diff', sha_original, sha_modified])

@danielduhh
Copy link
Contributor

danielduhh commented Dec 11, 2019

@jirikuncar I'm running into a similar issue with comparing both lists and sets...
if I have two lists A,B

list_a = ["changeA", "changeB"]
list_b = ["changeA", "changeC", "changeB"]

Dictdiff correctly finds the ADD "changeC", but is also reporting a change in index 1 from changeB to changeC. Is there a way to ignore this?

I tried converting the list into sets (since they are unordered and unindexed):

set1 = set(["changeA", "changeB"])
set2 = set(["changeA", "changeC", "changeB"])

But dictdiff is still reporting a CHANGE as

<class 'list'>: [
('add', '', [(0, {'changeC'})]), 
('change', '', ({'changeA', 'changeB'}, {'changeA', 'changeC', 'changeB'}))
]

@jirikuncar
Copy link
Member

@danielduhh I have tried with lastest dictdiffer version.

In [1]: import dictdiffer

In [2]: list_a = ["changeA", "changeB"]
   ...: list_b = ["changeA", "changeC", "changeB"]

In [3]: set1 = set(["changeA", "changeB"])
   ...: set2 = set(["changeA", "changeC", "changeB"])

In [4]: list(dictdiffer.diff(list_a, list_b))
Out[4]: [('change', [1], ('changeB', 'changeC')), ('add', '', [(2, 'changeB')])]

In [5]: list(dictdiffer.diff(set1, set2))
Out[5]:
[('add', '', [(0, {'changeC'})]),
 ('change', '', ({'changeA', 'changeB'}, {'changeA', 'changeB', 'changeC'}))]

I can confirm that the result for sets looks wrong. Can you submit a PR with test case and expected results?

@jirikuncar
Copy link
Member

@danielduhh can you check if #133 is solving your problem?

@Datamance
Copy link

FYI, this will be useful for those of us wishing to implement Strategic Three Way Merge Patch for the Kubernetes API :) for things like deployments, where you can have big lists of environment variables that look like so:

{
  "name": "MEDIA_URL",
  "valueFrom": {
    "configMapKeyRef": {
      "key": "MEDIA_URL",
      "name": "my-secret"
    }
  }
}

AFAIK the kubectl cli tool uses a schema that has pre-specified "patch merge keys" to essentially treat each element of the list like a value that is keyed by a specified property within (in the case above, it'd be "name")

as of dictdiffer 0.8.1, I'm still getting the wrong results for lists of dicts like this. I don't think you could solve this in a truly generalized way, but allowing schema-based diff could be a good approach.

@jirikuncar
Copy link
Member

@Datamance basically you would like to provide a custom diff function for certain keys?

@Datamance
Copy link

That could work, and I think you could use that as a building block for the common case of schema-based (particularly JSONSchema) diffing/patching.

@jirikuncar
Copy link
Member

So basically one shoud provide mapping from dotted_node to a function with a same signature as diff:

def diff(..., key=None):
    # after dotted_node ~ :158
    if dotted_node in key:
        for item in key[dotted_node](first, second, node=node, ignore=ignore, path_limit=path_limit, expand=expand, tolerance=tolerance, dot_notation=dot_notation):
            yield item
        return

@aorumbayev
Copy link

Any updates on this?

@jirikuncar
Copy link
Member

@aorumbayev feel free to send a PR with a fix discussed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants