Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support polars DataFrames, LazyFrames #1373

Merged
merged 15 commits into from
Mar 15, 2024
Merged

Support polars DataFrames, LazyFrames #1373

merged 15 commits into from
Mar 15, 2024

Conversation

cosmicBboy
Copy link
Collaborator

@cosmicBboy cosmicBboy commented Oct 7, 2023

Polars integration

The polars-dev branch is where development work to add polars support in pandera. The scope of this support is to cover the core pandera API:

  • core schemas
  • schema components
  • built-in and custom checks
  • integration with pandera type system
  • dataframe models
  • function decorators
  • configuration for enabling/disabling validation

See here for a more complete picture of the components above, and here for higher-level pieces of work that can be tackled somewhat independently.

Future work will cover the extras functionality such as serialization/deserialization of schemas to yaml/json or data synthesis strategies (unless there are some eager contributors out there who want to run with it).

Docs preview: https://pandera--1373.org.readthedocs.build/en/1373/polars.html#polars

@cosmicBboy cosmicBboy changed the title Polars dev Support polars LazyFrames Oct 7, 2023
@cosmicBboy cosmicBboy changed the title Support polars LazyFrames Support polars DataFrames, LazyFrames Oct 7, 2023
@cosmicBboy cosmicBboy marked this pull request as draft October 7, 2023 04:18
@codecov
Copy link

codecov bot commented Oct 7, 2023

Codecov Report

Attention: Patch coverage is 0.73260% with 271 lines in your changes are missing coverage. Please review.

Project coverage is 90.71%. Comparing base (4df61da) to head (cc9cb6f).
Report is 21 commits behind head on main.

❗ Current head cc9cb6f differs from pull request most recent head d25586c. Consider uploading reports for the commit d25586c to get more accurate results

Files Patch % Lines
pandera/backends/polars/builtin_checks.py 0.00% 58 Missing ⚠️
pandera/backends/polars/components.py 0.00% 49 Missing ⚠️
pandera/backends/polars/base.py 0.00% 45 Missing ⚠️
pandera/backends/polars/checks.py 0.00% 39 Missing ⚠️
pandera/backends/polars/container.py 0.00% 39 Missing ⚠️
pandera/backends/polars/__init__.py 0.00% 11 Missing ⚠️
pandera/api/polars/types.py 0.00% 10 Missing ⚠️
pandera/api/polars/components.py 0.00% 8 Missing ⚠️
pandera/api/polars/container.py 0.00% 6 Missing ⚠️
pandera/polars.py 0.00% 4 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1373      +/-   ##
==========================================
- Coverage   94.29%   90.71%   -3.58%     
==========================================
  Files          91      102      +11     
  Lines        7024     7246     +222     
==========================================
- Hits         6623     6573      -50     
- Misses        401      673     +272     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cosmicBboy cosmicBboy force-pushed the polars-dev branch 2 times, most recently from 0717428 to 41327d8 Compare February 23, 2024 15:19
@cosmicBboy cosmicBboy marked this pull request as ready for review March 11, 2024 05:46
cosmicBboy and others added 10 commits March 11, 2024 10:29
* Add polars engine and dtypes.

Signed-off-by: filipAisot <[email protected]>

* Add polars dependency.

Signed-off-by: filipAisot <[email protected]>

* Fix polars tests for polars >= 0.20.0

Signed-off-by: filipAisot <[email protected]>

* Fix polars engine. Add unittests for equivalence checks.

Signed-off-by: filipAisot <[email protected]>

---------

Signed-off-by: filipAisot <[email protected]>
* update ci to run tests on polars-dev PRs

Signed-off-by: cosmicBboy <[email protected]>

* fix type

Signed-off-by: cosmicBboy <[email protected]>

* use Union type

Signed-off-by: cosmicBboy <[email protected]>

* update deps

Signed-off-by: cosmicBboy <[email protected]>

* fix lint

Signed-off-by: cosmicBboy <[email protected]>

* loosen pylint contraints, fix strategies circular import

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: cosmicBboy <[email protected]>

update req files

Signed-off-by: cosmicBboy <[email protected]>
* add stub classes

Signed-off-by: Niels Bantilan <[email protected]>

* Add polars engine dtypes (#1465)

* Add polars engine and dtypes.

Signed-off-by: filipAisot <[email protected]>

* Add polars dependency.

Signed-off-by: filipAisot <[email protected]>

* Fix polars tests for polars >= 0.20.0

Signed-off-by: filipAisot <[email protected]>

* Fix polars engine. Add unittests for equivalence checks.

Signed-off-by: filipAisot <[email protected]>

---------

Signed-off-by: filipAisot <[email protected]>

* implement polars backend methods

Signed-off-by: cosmicBboy <[email protected]>

* implement methods for polars backend

Signed-off-by: cosmicBboy <[email protected]>

* implement data type coercion, strictness logic

Signed-off-by: cosmicBboy <[email protected]>

* implement add_missing_columns

Signed-off-by: cosmicBboy <[email protected]>

* implement core check methods

Signed-off-by: cosmicBboy <[email protected]>

* add dataframe model and components for polars

Signed-off-by: cosmicBboy <[email protected]>

* revert model component FieldInfo

Signed-off-by: cosmicBboy <[email protected]>

* fix core unit test regressions

Signed-off-by: cosmicBboy <[email protected]>

* implement generic DataFrameModel

Signed-off-by: cosmicBboy <[email protected]>

* move extract config logic into class definition

Signed-off-by: cosmicBboy <[email protected]>

* implement generic DataFrameModel

Signed-off-by: cosmicBboy <[email protected]>

* polars DataFrameModel uses new dataframe model api

Signed-off-by: cosmicBboy <[email protected]>

* simplify FieldInfo: decouple framework-specific model component

Signed-off-by: cosmicBboy <[email protected]>

* remove unused types

Signed-off-by: cosmicBboy <[email protected]>

* add more polars tests, clean-up pandas/polars api and backends

Signed-off-by: cosmicBboy <[email protected]>

* add more container and component checks

Signed-off-by: cosmicBboy <[email protected]>

* add polars component tests

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: filipAisot <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>
Co-authored-by: FilipAisot <[email protected]>
* add polars builtin check tests

Signed-off-by: Andrii G <[email protected]>

* update str_length, update tests

Signed-off-by: cosmicBboy <[email protected]>

* add test, update docstring

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: Andrii G <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>
Co-authored-by: cosmicBboy <[email protected]>
* add polars LazyFrame generic type, add docs

Signed-off-by: cosmicBboy <[email protected]>

* readme, makefile fixes

Signed-off-by: cosmicBboy <[email protected]>

* add support for pl.DataFrame

Signed-off-by: cosmicBboy <[email protected]>

* fix failure case collection logic

Signed-off-by: cosmicBboy <[email protected]>

* update docs

Signed-off-by: cosmicBboy <[email protected]>

* add custom check docs

Signed-off-by: cosmicBboy <[email protected]>

* add polars elementwise check support

Signed-off-by: cosmicBboy <[email protected]>

* update docs

Signed-off-by: cosmicBboy <[email protected]>

* add NotImplementedError for polars strategies

Signed-off-by: cosmicBboy <[email protected]>

* update docs

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: cosmicBboy <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>
Signed-off-by: cosmicBboy <[email protected]>
…sted types (#1526)

* refactor polars type engine for LazyFrames, add nested types

Signed-off-by: cosmicBboy <[email protected]>

* add polars engine in the container/component backends

Signed-off-by: cosmicBboy <[email protected]>

* add tests and docs for nested types

Signed-off-by: cosmicBboy <[email protected]>

* fix mypy

Signed-off-by: cosmicBboy <[email protected]>

* fix pylint, mypy

Signed-off-by: cosmicBboy <[email protected]>

* handle annotated types

Signed-off-by: cosmicBboy <[email protected]>

* update docs

Signed-off-by: cosmicBboy <[email protected]>

* update nested dtype docstrings

Signed-off-by: cosmicBboy <[email protected]>

---------

Signed-off-by: cosmicBboy <[email protected]>
@cosmicBboy cosmicBboy merged commit 7d1b1ba into main Mar 15, 2024
72 checks passed
@cosmicBboy cosmicBboy deleted the polars-dev branch March 16, 2024 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants