Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV in reading parquet data if column type mismatch occurs #12349

Open
czentgr opened this issue Feb 15, 2025 · 0 comments · May be fixed by #12350
Open

SEGV in reading parquet data if column type mismatch occurs #12349

czentgr opened this issue Feb 15, 2025 · 0 comments · May be fixed by #12350
Assignees
Labels
bug Something isn't working

Comments

@czentgr
Copy link
Collaborator

czentgr commented Feb 15, 2025

Bug description

The issue is easily reproducible:

make a table that contains a single varchar column

create table test (c0 varchar) with (format='PARQUET');
insert into test values ('abc');

Save the resulting parquet file and delete the table.
Create a new table:

create table test (c0 integer) with (format='PARQUET');

and copy the data file into the storage path of the new table.

This simulates when data is created externally and a new table is created to process it.
If the table definition is incorrect then a SEGV can occur.

A basic read query:

select * from test;

causes the following SEGV:

I20250214 11:04:53.720890 67128642 TaskManager.cpp:588] Starting task 20250214_032653_00006_jukim.1.0.0.0 with 16 max drivers.
*** Aborted at 1739549093 (unix time) try "date -d @1739549093" if you are using GNU date ***
PC: @                0x0 facebook::velox::tpch::dbgen::name_bits
*** SIGSEGV (@0xc) received by PID 44213 (TID 0x16ebd7000) stack trace: ***
    @        0x187e06e04 _sigtramp
    @        0x10c929e34 facebook::velox::ByteOutputStream::appendOne<>()
    @        0x10c929e34 facebook::velox::ByteOutputStream::appendOne<>()
    @        0x10c911150 facebook::velox::serializer::presto::detail::VectorStream::appendLength()
    @        0x10c926f2c facebook::velox::serializer::presto::detail::VectorStream::appendLengths<>()
I20250214 11:04:53.750250 67134468 TaskManager.cpp:634] No more splits for 20250214_032653_00006_jukim.1.0.0.0 for node 0
    @        0x10c926bd8 facebook::velox::serializer::presto::detail::(anonymous namespace)::appendStrings()
    @        0x10c9206fc facebook::velox::serializer::presto::detail::(anonymous namespace)::serializeFlatVector<>()
    @        0x10c91e744 _ZZZN8facebook5velox10serializer6presto6detail15serializeColumnERKNSt3__110shared_ptrINS0_10BaseVectorEEERKN5folly5RangeIPKiEEPNS3_12VectorStreamERNS0_7ScratchEENK3$_0clEvENKUlvE_clEv
    @        0x10c90f564 facebook::velox::serializer::presto::detail::serializeColumn()::$_0::operator()()
    @        0x10c90f3ac facebook::velox::serializer::presto::detail::serializeColumn()
    @        0x10c910054 facebook::velox::serializer::presto::detail::(anonymous namespace)::serializeWrapped()
    @        0x10c90f468 facebook::velox::serializer::presto::detail::serializeColumn()
    @        0x10c8e1efc facebook::velox::serializer::presto::detail::PrestoIterativeVectorSerializer::append()
    @        0x10d766404 facebook::velox::VectorStreamGroup::append()
    @        0x10cc61614 facebook::velox::exec::detail::Destination::advance()
    @        0x10cc64424 facebook::velox::exec::PartitionedOutput::getOutput()
    @        0x10ca631f0 facebook::velox::exec::Driver::runInternal()::$_9::operator()()
    @        0x10ca50498 facebook::velox::exec::Driver::withDeltaCpuWallTimer<>()
    @        0x10ca4e598 facebook::velox::exec::Driver::runInternal()
    @        0x10ca507a4 facebook::velox::exec::Driver::run()
    @        0x10ca53f04 facebook::velox::exec::Driver::enqueue()::$_0::operator()()
    @        0x10ca53e14 folly::detail::function::call_<>()
    @        0x102afa06c folly::detail::function::FunctionTraits<>::operator()()
    @        0x10db250b4 folly::ThreadPoolExecutor::runTask()
    @        0x10da92fe0 folly::CPUThreadPoolExecutor::threadRun()
    @        0x10db2b18c _ZNSt3__18__invokeB8ne180100IRMN5folly18ThreadPoolExecutorEFvNS_10shared_ptrINS2_6ThreadEEEERPS2_JRS5_EvEEDTcldsdeclsr3stdE7declvalIT0_EEclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT1_EEEEOSD_OSC_DpOSE_
    @        0x10db2b0dc _ZNSt3__115__apply_functorB8ne180100IMN5folly18ThreadPoolExecutorEFvNS_10shared_ptrINS2_6ThreadEEEENS_5tupleIJPS2_S5_EEEJLm0ELm1EENS8_IJEEEEENS_13__bind_returnIT_T0_T2_Xsr22__is_valid_bind_returnISD_SE_SF_EE5valueEE4typeERSD_RSE_NS_15__tuple_indicesIJXspT1_EEEEOSF_
    @        0x10db2b07c _ZNSt3__16__bindIMN5folly18ThreadPoolExecutorEFvNS_10shared_ptrINS2_6ThreadEEEEJPS2_RS5_EEclB8ne180100IJEEENS_13__bind_returnIS7_NS_5tupleIJS8_S5_EEENSD_IJDpOT_EEEXsr22__is_valid_bind_returnIS7_SE_SI_EE5valueEE4typeESH_
    @        0x10db2ae5c folly::detail::function::call_<>()
    @        0x102afa06c folly::detail::function::FunctionTraits<>::operator()()
    @        0x102afa038 _ZZN5folly18NamedThreadFactory9newThreadEONS_8FunctionIFvvEEEENUlvE_clEv
    @        0x102af9fcc _ZNSt3__18__invokeB8ne180100IZN5folly18NamedThreadFactory9newThreadEONS1_8FunctionIFvvEEEEUlvE_JEEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOS8_DpOS9_

An error might not always occur, for example

select max(c0) from test;

will succeed.

System information

n/a

Relevant logs

@czentgr czentgr added bug Something isn't working triage Newly created issue that needs attention. labels Feb 15, 2025
@czentgr czentgr self-assigned this Feb 15, 2025
@czentgr czentgr removed the triage Newly created issue that needs attention. label Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant