Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read nullable from Parquet metadata #5

Open
madsbk opened this issue Dec 18, 2023 · 0 comments
Open

Read nullable from Parquet metadata #5

madsbk opened this issue Dec 18, 2023 · 0 comments
Labels
good first issue Good for newcomers improvement Improves an existing functionality

Comments

@madsbk
Copy link
Member

madsbk commented Dec 18, 2023

In the current implementation, we read the first row of a Parquet file in order to determine which columns within a parquet file are nullable.
This has some significant performance drawback, since we are going to be paying the cost of at least an entire row group in terms of decompression time and memory.

Instead, we should parse the parquet metadata before launching any tasks and extract the nullable information.

@madsbk madsbk added good first issue Good for newcomers improvement Improves an existing functionality labels Dec 19, 2023
@seberg seberg transferred this issue from another repository Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers improvement Improves an existing functionality
Projects
None yet
Development

No branches or pull requests

1 participant