Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Description update and house cleaning #368

Merged
merged 1 commit into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CONTRIBUTORS
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@
- Mark Zhou
- Ian Whitestone
- Faisal Dosani
- Lorenzo Mercado
- Lorenzo Mercado
- Jacob Dawang
- Raymond Haffar
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,19 @@
![PyPI - Downloads](https://img.shields.io/pypi/dm/datacompy)


DataComPy is a package to compare two Pandas DataFrames. Originally started to
be something of a replacement for SAS's ``PROC COMPARE`` for Pandas DataFrames
with some more functionality than just ``Pandas.DataFrame.equals(Pandas.DataFrame)``
(in that it prints out some stats, and lets you tweak how accurate matches have to be).
Then extended to carry that functionality over to Spark Dataframes.
DataComPy is a package to compare two DataFrames (or tables) such as Pandas, Spark, Polars, and
even Snowflake. Originally it was created to be something of a replacement
for SAS's ``PROC COMPARE`` for Pandas DataFrames with some more functionality than
just ``Pandas.DataFrame.equals(Pandas.DataFrame)`` (in that it prints out some stats,
and lets you tweak how accurate matches have to be). Supported types include:

- Pandas
- Polars
- Spark
- Snowflake (via snowpark)
- Dask (via Fugue)
- DuckDB (via Fugue)


## Quick Installation

Expand Down
10 changes: 2 additions & 8 deletions ROADMAP.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,7 @@ datacompy Roadmap
-----------------

At this current time ``datacompy`` is in a stable state. We are planning on continuing to
add features and functionality as the community of users asks for them, but there are no
add features and functionality as the community of users asks for them, but there are no
pressing issues which we are looking to add in immediately.

There are some longer term issues which are open for people to work on, and some which are more of a nice to have.
We are looking for contributors and also maintaners to help with the project.

- Add in docs how to change the number of mismatches in report `#6 <https://github.com/capitalone/datacompy/issues/6>`_
- Make duplicate handling better `#7 <https://github.com/capitalone/datacompy/issues/7>`_
- Refactor Spark datacompy `#13 <https://github.com/capitalone/datacompy/issues/13>`_
- Drop Python 3.7 suport `#173 <https://github.com/capitalone/datacompy/issues/173>`_
Please feel free to check the issues section of the repository for the most up to date list.
4 changes: 2 additions & 2 deletions docs/source/pandas_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Overview
--------

The main goal of ``datacompy`` is to provide a human-readable output describing
differences between two dataframes. For example, if you have two dataframes
differences between two dataframes. For example, if you have two dataframes
containing data like:

df1
Expand Down Expand Up @@ -289,4 +289,4 @@ There's a number of limitations with ``datacompy``:

#Numpy testing
npt.assert_array_equal(arr1, arr2)
npt.assert_almost_equal(obj1, obj2)
npt.assert_almost_equal(obj1, obj2)
6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,14 @@ name = "datacompy"
description = "Dataframe comparison in Python"
readme = "README.md"
authors = [
{ name="Faisal Dosani", email="[email protected]" },
{ name="Ian Robertson" },
{ name="Dan Coates" },
{ name="Faisal Dosani", email="[email protected]" },
]
maintainers = [
{ name="Faisal Dosani", email="[email protected]" }
{ name="Faisal Dosani", email="[email protected]" },
{ name="Jacob Dawang", email="[email protected]" },
{ name="Raymond Haffar", email="[email protected]" },
]
license = {text = "Apache Software License"}
dependencies = ["pandas<=2.2.3,>=0.25.0", "numpy<=2.2.0,>=1.22.0", "ordered-set<=4.1.0,>=4.0.2", "polars[pandas]<=1.17.1,>=0.20.4"]
Expand Down
Loading