-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plots: cleanup data extraction #6355
Conversation
if extension in (".json", ".yaml"): | ||
return DictData(filename, revision, content) | ||
if extension in (".csv", ".tsv"): | ||
return ListData(filename, revision, content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After moving data parsing to collect
in #5984 we can abstract obtained data.
def _apply_path(data, path=None, **kwargs): | ||
if not path or not isinstance(data, dict): | ||
return data | ||
|
||
import jsonpath_ng | ||
|
||
found = jsonpath_ng.parse(path).find(data) | ||
first_datum = first(found) | ||
if ( | ||
len(found) == 1 | ||
and isinstance(first_datum.value, list) | ||
and isinstance(first(first_datum.value), dict) | ||
): | ||
data_points = first_datum.value | ||
elif len(first_datum.path.fields) == 1: | ||
field_name = first(first_datum.path.fields) | ||
data_points = [{field_name: datum.value} for datum in found] | ||
else: | ||
raise PlotDataStructureError() | ||
|
||
if not isinstance(data_points, list) or not ( | ||
isinstance(first(data_points), dict) | ||
): | ||
raise PlotDataStructureError() | ||
|
||
return data_points | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not supported from #3840 I recall that we have been talking about keeping it in another form, but anyway, it can always be brought back from git history.
@pytest.mark.parametrize("data_class", [JSONPlotData, YAMLPlotData]) | ||
def test_find_data_in_dict(tmp_dir, data_class): | ||
metric = [{"accuracy": 1, "loss": 2}, {"accuracy": 3, "loss": 4}] | ||
dmetric = {"train": metric} | ||
|
||
plot_data = data_class("-", "revision", dmetric) | ||
|
||
expected = metric | ||
for d in expected: | ||
d["rev"] = "revision" | ||
|
||
assert list(map(dict, plot_data.to_datapoints())) == expected | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to test_data
from dvc.repo.plots.data import _apply_path, _find_data, _lists | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"path,expected_result", | ||
[ | ||
("$.some.path[*].a", [{"a": 1}, {"a": 4}]), | ||
("$.some.path", [{"a": 1, "b": 2, "c": 3}, {"a": 4, "b": 5, "c": 6}]), | ||
], | ||
) | ||
def test_parse_json(path, expected_result): | ||
value = { | ||
"some": {"path": [{"a": 1, "b": 2, "c": 3}, {"a": 4, "b": 5, "c": 6}]} | ||
} | ||
|
||
result = _apply_path(value, path=path) | ||
|
||
assert result == expected_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before, no need for this test for now.
def test_finding_data(fields): | ||
data = {"a": {"b": [{"x": 2, "y": 3}, {"x": 1, "y": 5}]}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded scope of the test to test to_datapoints
and remove redundancy in tests (test_finding_data
was checking basically the same as test_find_data_in_dict
).
@@ -57,7 +57,6 @@ def run(self): | |||
"nanotime>=0.5.2", | |||
"pyasn1>=0.4.1", | |||
"voluptuous>=0.11.7", | |||
"jsonpath-ng>=1.5.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π