fix(parquet): Avoid SEGV if table column type does not match file column type #12350

czentgr · 2025-02-15T03:53:10Z

If a user defines a table, for example in Hive, where the column types don’t match the file column types SEGV might occur. Specifically, the SEGV has been observed if the parquet file contains a VARCHAR column but the table defines an INTEGER column instead and the data is accessed via TableScan using a basic select * .

The resulting vector is of type string but it doesn’t match the table metadata and can cause a SEGV in the PartitionedOutput operator.

This also prevents issues and error coming from the readers when they encounter types that are not part of the switch, for example, defining a VARCHAR type column when the file column is an INTEGER.

Fixes: #12349

…mn type If a user defines a table, for example in Hive, where the column types don’t match the file column types SEGV might occur. Specifically, the SEGV has been observed if the parquet file contains a VARCHAR column but the table defines an INTEGER column instead and the data is accessed via TableScan using a basic select * . The resulting vector is of type string but it doesn’t match the table metadata and can cause a SEGV in the PartitionedOutput operator. This also prevents issues and error coming from the readers when they encounter types that are not part of the switch, for example, defining a VARCHAR type column when the file column is an INTEGER.

netlify · 2025-02-15T03:53:29Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`badb13e`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/67b00fa8b3700600088396a3

czentgr · 2025-02-15T03:53:30Z

New E2E test output instead of SEGV:

presto:default> select var from segv_int_var;

Query 20250215_035213_00000_xqdxq, FAILED, 1 node
Splits: 2 total, 0 done (0.00%)
[Latency: client-side: 0:01, server-side: 0:01] [0 rows, 0B] [0 rows/s, 0B/s]

Query 20250215_035213_00000_xqdxq failed: veloxType->equivalent(*requestedType) Converted type VARCHAR does not match the requested type INTEGER Split [Hive: file:/Users/czentgr/presto/data/hive_data/default/segv_int_var/20250214_032315_00002_jukim.1.0.0.0_0_5_3fb8b160-a7fe-4c52-855a-1c77c857333c.parquet 0 - 354] Task 20250215_035213_00000_xqdxq.1.0.0.0 Operator: TableScan[0] 0

majetideepak

Thanks, @czentgr

Yuhta · 2025-02-17T16:20:59Z

velox/dwio/parquet/reader/ParquetReader.cpp

+    // if provided.
+    if (requestedType) {
+      VELOX_CHECK(
+          veloxType->equivalent(*requestedType),


This does not need to be exact match, some schema evolution should be supported (e.g. INTEGER->BIGINT). You should add individual checks inside convertType() for each converted type.

This makes sense! We should confirm what schema evolution is currently supported.

At file level we support these:

Struct field rename

Additional field at end of struct

Type widening: all integer types can be widen; REAL can be widen to DOUBLE; there is no conversion between floating point types and integer types

Check out TableEvolutionFuzzer to see some examples (ideally we want enable it for Parquet as well): https://github.com/facebookincubator/velox/blob/main/velox/exec/tests/TableEvolutionFuzzerTest.cpp#L140

Struct field rename PR is here #5962
We need an overall design for schema evolution.

In this PR, we should at least throw a reasonable unsupported error instead of SEGV.

You should add individual checks inside convertType() for each converted type.

Let's do this as a starting point. We can error out for unsupported schema evolution.

facebook-github-bot · 2025-02-18T16:27:35Z

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 15, 2025

majetideepak approved these changes Feb 17, 2025

View reviewed changes

majetideepak marked this pull request as ready for review February 17, 2025 14:39

majetideepak changed the title ~~fix(parquet): Fix SEGV if table column type does not match file column type~~ fix(parquet): Avoid SEGV if table column type does not match file column type Feb 17, 2025

majetideepak added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Feb 17, 2025

Yuhta reviewed Feb 17, 2025

View reviewed changes

Yuhta removed the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parquet): Avoid SEGV if table column type does not match file column type #12350

fix(parquet): Avoid SEGV if table column type does not match file column type #12350

czentgr commented Feb 15, 2025 •

edited

Loading

netlify bot commented Feb 15, 2025 •

edited

Loading

czentgr commented Feb 15, 2025

majetideepak left a comment

Yuhta Feb 17, 2025 •

edited

Loading

majetideepak Feb 17, 2025

Yuhta Feb 17, 2025

majetideepak Feb 18, 2025

facebook-github-bot commented Feb 18, 2025

fix(parquet): Avoid SEGV if table column type does not match file column type #12350

Are you sure you want to change the base?

fix(parquet): Avoid SEGV if table column type does not match file column type #12350

Conversation

czentgr commented Feb 15, 2025 • edited Loading

netlify bot commented Feb 15, 2025 • edited Loading

✅ Deploy Preview for meta-velox canceled.

czentgr commented Feb 15, 2025

majetideepak left a comment

Choose a reason for hiding this comment

Yuhta Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

majetideepak Feb 17, 2025

Choose a reason for hiding this comment

Yuhta Feb 17, 2025

Choose a reason for hiding this comment

majetideepak Feb 18, 2025

Choose a reason for hiding this comment

facebook-github-bot commented Feb 18, 2025

czentgr commented Feb 15, 2025 •

edited

Loading

netlify bot commented Feb 15, 2025 •

edited

Loading

Yuhta Feb 17, 2025 •

edited

Loading