Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption when replacing values in DataFrame #296

Open
hypsakata opened this issue Jun 12, 2024 · 1 comment
Open

Data corruption when replacing values in DataFrame #296

hypsakata opened this issue Jun 12, 2024 · 1 comment

Comments

@hypsakata
Copy link

I've encountered a problem where replacing values in a DataFrame corrupts the original column data.

irb> df = RedAmber::DataFrame.new(val: [352, 256, 4, 0]);
irb> df.assign(val: df[:val].replace(df[:val] == 0, 1))
=>
#<RedAmber::DataFrame : 4 x 1 Vector, 0x000000000003bf60>
      val
  <uint8>
0      96
1       0
2       4
3       1

This happens because the column data type changes to match the input value's type. While this behavior is consistent, it is not intuitive and should match the original column's data type. How about changing this behavior?

However, as a result of changing the behavior, if the column type is uint8 and the replacement value is of type double or a large integer, the replacement data will be corrupted. It would be useful to have a method for easier data type casting or to allow specifying the data type as a keyword argument in the replace method, e.g., .replace(…, data_type: :double).

I'd appreciate any comments or ideas on this matter.
Thanks.

@kou
Copy link
Member

kou commented Oct 6, 2024

@heronshoes What do you think about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants