Read nullable from Parquet metadata #5

madsbk · 2023-12-18T09:27:35Z

In the current implementation, we read the first row of a Parquet file in order to determine which columns within a parquet file are nullable.
This has some significant performance drawback, since we are going to be paying the cost of at least an entire row group in terms of decompression time and memory.

Instead, we should parse the parquet metadata before launching any tasks and extract the nullable information.

madsbk added good first issue Good for newcomers improvement Improves an existing functionality labels Dec 19, 2023

seberg transferred this issue from another repository Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read nullable from Parquet metadata #5

Read nullable from Parquet metadata #5

madsbk commented Dec 18, 2023 •

edited by seberg

Loading

Read nullable from Parquet metadata #5

Read nullable from Parquet metadata #5

Comments

madsbk commented Dec 18, 2023 • edited by seberg Loading

madsbk commented Dec 18, 2023 •

edited by seberg

Loading