Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide more details in error message around concat_rows #1023

Open
LostKobrakai opened this issue Nov 20, 2024 · 1 comment
Open

Provide more details in error message around concat_rows #1023

LostKobrakai opened this issue Nov 20, 2024 · 1 comment

Comments

@LostKobrakai
Copy link

Given the following dataframes:

[
  #Explorer.DataFrame<
    Polars[1134 x 4]
    gtin string […]
    series string […]
    program string […]
    program_color string […]
  >,
  #Explorer.DataFrame<
    Polars[1520 x 4]
    gtin string […]
    series string […]
    program null […]
    series_color string […]
  >
]

I got

[error] ** (ArgumentError) dataframes must have the same columns
    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

This lead me to believe that the null vs string column type to be the issue while it was the different *_color columns.

The error message could be better and concat_rows docs could call out that typecasting works between null and other column types

@billylanchantin
Copy link
Contributor

Small clarification (we chatted on slack):

The error message could be improved by calling out which columns specifically didn't match. Something like:

** (ArgumentError) dataframes must have the same columns

  * Left DataFrame has these columns not present in the right DataFrame:

      ["program_color"]

  * Right DataFrame has these columns not present in the left DataFrame:

      ["series_color"]

    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

where internally we'd do something like:

left_cols = left_df |> names() |> MapSet.new()
right_cols = right_df |> names() |> MapSet.new()

mismatched_cols = MapSet.symmetric_difference(left_cols, right_cols)

in_left_only = left_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()
in_right_only = right_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants