Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PanicException when writing to Delta a Null value casted to a list of structs #20734

Open
2 tasks done
algorri94 opened this issue Jan 15, 2025 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@algorri94
Copy link

algorri94 commented Jan 15, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

test = {"fields": [[]]}

test_df = pl.DataFrame(test).select(
    pl.col("fields").cast(
        pl.List(pl.Struct([pl.Field("name", pl.String), pl.Field("value", pl.String), pl.Field("items", pl.Int64)]))
    )
)

test_df.write_delta("/code/test_delta", mode="append", delta_write_options={"schema_mode": "merge"})

Log output

---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[12], line 12
      4 test = {"fields": [[]]}
      6 test_df = pl.DataFrame(test).select(
      7     pl.col("fields").cast(
      8         pl.List(pl.Struct([pl.Field("name", pl.String), pl.Field("value", pl.String), pl.Field("items", pl.Int64)]))
      9     )
     10 )
---> 12 test_df.write_delta("/code/test_delta", mode="append", delta_write_options={"schema_mode": "merge"})

File ~/.cache/pypoetry/virtualenvs/harvester-metrics-MATOk_fk-py3.11/lib/python3.11/site-packages/polars/dataframe/frame.py:4498, in DataFrame.write_delta(self, target, mode, overwrite_schema, storage_options, delta_write_options, delta_merge_options)
   4495     delta_write_options["schema_mode"] = "overwrite"
   4497 schema = delta_write_options.pop("schema", None)
-> 4498 write_deltalake(
   4499     table_or_uri=target,
   4500     data=data,
   4501     schema=schema,
   4502     mode=mode,
   4503     storage_options=storage_options,
   4504     **delta_write_options,
   4505 )
   4506 return None

File ~/.cache/pypoetry/virtualenvs/harvester-metrics-MATOk_fk-py3.11/lib/python3.11/site-packages/deltalake/writer.py:323, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, target_file_size, large_dtypes, engine, writer_properties, custom_metadata, post_commithook_properties, commit_properties)
    317 data, schema = _convert_data_and_schema(
    318     data=data,
    319     schema=schema,
    320     conversion_mode=ArrowSchemaConversionMode.PASSTHROUGH,
    321 )
    322 data = RecordBatchReader.from_batches(schema, (batch for batch in data))
--> 323 write_deltalake_rust(
    324     table_uri=table_uri,
    325     data=data,
    326     partition_by=partition_by,
    327     mode=mode,
    328     table=table._table if table is not None else None,
    329     schema_mode=schema_mode,
    330     predicate=predicate,
    331     target_file_size=target_file_size,
    332     name=name,
    333     description=description,
    334     configuration=configuration,
    335     storage_options=storage_options,
    336     writer_properties=writer_properties,
    337     commit_properties=commit_properties,
    338     post_commithook_properties=post_commithook_properties,
    339 )
    340 if table:
    341     table.update_incremental()

PanicException: Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type. Before importing buffer through FFI, please make sure the allocation is aligned.

Issue description

If you have a DataFrame with a list of structs data type and some row happen to be null/empty, the delta_write function breaks.

As a workaround, I have to split the dataframe in two, drop the column where the values are null and write both separately.

Expected behavior

The delta_write writes the record to Delta Lake with that column set to null

Installed versions

--------Version info---------
Polars:              1.19.0
Index type:          UInt32
Platform:            Linux-6.11.10-orbstack-00282-g72f45320fe21-x86_64-with-glibc2.36
Python:              3.11.11 (main, Dec 25 2024, 01:32:21) [GCC 12.2.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.35.96
cloudpickle          <not installed>
connectorx           0.4.0
deltalake            0.23.2
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                <not installed>
openpyxl             <not installed>
pandas               <not installed>
pyarrow              18.1.0
pydantic             2.10.5
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@algorri94 algorri94 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant