-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IGNORE_ERRORS when scanning from pyarrow/pandas #4646
Conversation
3fdb113
to
2f21964
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4646 +/- ##
==========================================
- Coverage 86.40% 86.39% -0.01%
==========================================
Files 1373 1373
Lines 58250 58253 +3
Branches 7219 7216 -3
==========================================
- Hits 50329 50328 -1
- Misses 7757 7761 +4
Partials 164 164 ☔ View full report in Codecov by Sentry. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
074809c
to
dbad6a6
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
1ee740f
to
ed44e03
Compare
Benchmark ResultMaster commit hash:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several KUZU_API changes which I don't follow why they're now needed in this PR?
CMakeLists.txt
Outdated
@@ -317,7 +317,7 @@ add_subdirectory(third_party) | |||
if(${BUILD_KUZU}) | |||
add_definitions(-DKUZU_ROOT_DIRECTORY="${PROJECT_SOURCE_DIR}") | |||
add_definitions(-DKUZU_CMAKE_VERSION="${CMAKE_PROJECT_VERSION}") | |||
add_definitions(-DKUZU_EXTENSION_VERSION="0.7.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm Do we need to upgrade the extension version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a virtual function to TableFuncBindData
which is inherited by some classes in the extensions (e.g. JSONScanBindData) so I probably need to bump the extension version.
That single change (me needing to export ScanBindData/TableFuncBindData) caused a bunch of dynamic casts in the macos extension test to fail due to the casted targets not being exported which is why I added all those
KUZU_API` changes. I'm not completely sure why they showed up now and not before though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. We don't need to bump extension version manually in PRs now. The CI pipeline will handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit strange, as we have virtual function in TableFuncBindData
previously. I wonder why we don't need to mark KUZU_API previously, but now. @acquamarin Can you also take a look at the KUZU_API changes in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably because the only virtual method before is copy
which is pure virtual (so the definitions are all in the child classes). The method I added has a default implementation so it actually needs to exported for child classes in the extensions to access.
@@ -17,6 +18,7 @@ namespace kuzu { | |||
PyArrowScanConfig::PyArrowScanConfig(const common::case_insensitive_map_t<Value>& options) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why PyArrowScan and PandasScan have different ways to convert scanConfig. I need to take another look to see if we can unify them. but can you also take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason is that we don't support the pyarrow scan options (skipNum
and limitNum
) in pandas scan so before this PR we don't actually support any configuration for pandas scan. I could look into if it's doable to support those options in pandas scan and if it is I'll add the support and unify the config in a separate PR.
Benchmark ResultMaster commit hash:
|
Benchmark ResultMaster commit hash:
|
46d076d
to
da4f82f
Compare
Benchmark ResultMaster commit hash:
|
Description
Adds support for the default
IGNORE_ERRORS
behaviour (skipping rows that trigger PK-related exceptions) when scanning from pandas/pyarrow.Fixes #4534
Contributor agreement