Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error reading bigquery table having column of type datetime #20700

Open
2 tasks done
jakejh opened this issue Jan 14, 2025 · 5 comments
Open
2 tasks done

error reading bigquery table having column of type datetime #20700

jakejh opened this issue Jan 14, 2025 · 5 comments
Labels
A-io-database Area: reading/writing to databases bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@jakejh
Copy link

jakejh commented Jan 14, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
from google.cloud import bigquery
from google.oauth2.service_account import Credentials

def read_bigquery(query, credentials):
  client = bigquery.Client(credentials = credentials)
  query_job = client.query(query)
  rows = query_job.result().to_arrow()
  df = pl.from_arrow(rows)
  return df

# table in bigquery has a column whose type is "datetime", i.e., no timezone

credentials = get_credentials() # such as from a service account file
df = read_bigquery('select * from bq_dataset.bq_table limit 10', credentials)

Log output

Traceback (most recent call last):
  File "<python-input-4>", line 3, in <module>
    nurses_old = utils.read_bigquery(
      f"select * from `{params['dataset']}.nurses`", params['credentials'])
  File "/Users/.../src/utils.py", line 62, in read_bigquery
    df = pl.from_arrow(rows)
  File "/Users/.../.venv/lib/python3.13/site-packages/polars/convert/general.py", line 436, in from_arrow
    arrow_to_pydf(
    ~~~~~~~~~~~~~^
        data=data,
        ^^^^^^^^^^
    ...<2 lines>...
        schema_overrides=schema_overrides,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/.../.venv/lib/python3.13/site-packages/polars/_utils/construction/dataframe.py", line 1181, in arrow_to_pydf
    pydf = PyDataFrame.from_arrow_record_batches(batches, data.schema)
polars.exceptions.ComputeError: cannot create series from Extension("google:sqlType:datetime", Timestamp(Microsecond, None), None)


### Issue description

Starting in version 1.18.0, the code gives the error shown above. I think this issue is related to the changes made in #20248 .

### Expected behavior

In version 1.17.0, the code does not give an error and instead returns a dataframe having a column of type "datetime[us]".

### Installed versions

<details>

--------Version info---------
Polars: 1.18.0
Index type: UInt32
Platform: macOS-15.2-x86_64-i386-64bit-Mach-O
Python: 3.13.0 (main, Oct 16 2024, 09:15:13) [Clang 18.1.8 ]
LTS CPU: False

----Optional dependencies----
adbc_driver_manager
altair
azure.identity
boto3
cloudpickle
connectorx
deltalake
fastexcel
fsspec
gevent
google.auth 2.37.0
great_tables
matplotlib
nest_asyncio
numpy
openpyxl
pandas
pyarrow 18.1.0
pydantic
pyiceberg
sqlalchemy
torch
xlsx2csv
xlsxwriter 3.2.0


</details>
@jakejh jakejh added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 14, 2025
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 14, 2025

These don't appear to be Arrow Datetime values, but rather some custom/extension type called google:sqlType:datetime, and we do not have support in Polars for arbitrary Arrow extension types. I'd hope it's possible to return standard Arrow Datetime types from BigQuery, but not sure if I currently have access to an instance to check/verify that 🤔

@alexander-beedie alexander-beedie added the A-io-database Area: reading/writing to databases label Jan 14, 2025
@FerPBI
Copy link

FerPBI commented Jan 19, 2025

I can confirm this issue is still present in polars 1.20.0. When trying to handle BigQuery's google:sqlType:datetime type, Polars does not properly convert it to standard Arrow Datetime values.

Steps to reproduce:

  1. Query BigQuery table containing datetime fields
  2. Attempt to load results into Polars dataframe
  3. Observe that datetime fields retain the custom google:sqlType:datetime type instead of converting to Arrow Datetime

The specific error encountered is:

ComputeError: cannot create series from Extension(ExtensionType { name: "google:sqlType:datetime", inner: Timestamp(Microsecond, None), metadata: None })

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 22, 2025

I can confirm this issue is still present in polars 1.20.0. When trying to handle BigQuery's google:sqlType:datetime type, Polars does not properly convert it to standard Arrow Datetime values.

Indeed - but then, we shouldn't have to; ideally BigQuery should be able to produce standard Arrow Datetime values rather than a custom "google:sqlType:datetime" ExtensionType (I'm unsure why they do this 🤔).

@coastalwhite coastalwhite marked this as a duplicate of #20849 Jan 22, 2025
@jsarbach
Copy link
Contributor

Removing the metadata from the fields seems to work as a workaround:

table = job.to_arrow()
table = pa.Table.from_batches(table.to_batches(), schema=pa.schema([field.remove_metadata() for field in table.schema]))
df = pl.from_arrow(table)

Any idea why it worked in polars<1.18.0? Here's some sample data in case someone wants to investigate: https://storage.googleapis.com/cosmic-mariner-294413-ew6/table.parquet

>>> table = pq.read_table('table.parquet')
>>> table.schema
record_id: int64
date_of_occurence: timestamp[us]
  -- field metadata --
  ARROW:extension:name: 'google:sqlType:datetime'
>>> table.field(1).type
TimestampType(timestamp[us])
>>> table.field(1).metadata
{b'ARROW:extension:name': b'google:sqlType:datetime'}

@coastalwhite

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-database Area: reading/writing to databases bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

5 participants