Data corruption when replacing values in DataFrame #296

hypsakata · 2024-06-12T08:04:39Z

I've encountered a problem where replacing values in a DataFrame corrupts the original column data.

irb> df = RedAmber::DataFrame.new(val: [352, 256, 4, 0]);
irb> df.assign(val: df[:val].replace(df[:val] == 0, 1))
=>
#<RedAmber::DataFrame : 4 x 1 Vector, 0x000000000003bf60>
      val
  <uint8>
0      96
1       0
2       4
3       1

This happens because the column data type changes to match the input value's type. While this behavior is consistent, it is not intuitive and should match the original column's data type. How about changing this behavior?

However, as a result of changing the behavior, if the column type is uint8 and the replacement value is of type double or a large integer, the replacement data will be corrupted. It would be useful to have a method for easier data type casting or to allow specifying the data type as a keyword argument in the replace method, e.g., .replace(…, data_type: :double).

I'd appreciate any comments or ideas on this matter.
Thanks.

The text was updated successfully, but these errors were encountered:

kou · 2024-10-06T06:23:29Z

@heronshoes What do you think about this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data corruption when replacing values in DataFrame #296

Data corruption when replacing values in DataFrame #296

hypsakata commented Jun 12, 2024

kou commented Oct 6, 2024

Data corruption when replacing values in DataFrame #296

Data corruption when replacing values in DataFrame #296

Comments

hypsakata commented Jun 12, 2024

kou commented Oct 6, 2024