to_numpy()
is more memory intensive than to_pandas()
with string or categorical columns
#20765
Open
2 tasks done
Labels
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Log output
Issue description
I don't know if this is a real bug or a limitation of
to_numpy()
method.I benchmarked the performance of
to_numpy()
versusto_pandas()
on a dataset containing three different data types (float, string, and categorical). When the dataset consists solely of numerical columns, both methods show similar performance. However, when string or categorical columns are introduced,to_numpy()
becomes significantly more memory-intensive.Expected behavior
I'm expecting same memory usage but this depends on how numpy and pandas deal with strings and categorical columns.
Installed versions
The text was updated successfully, but these errors were encountered: