Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the FileNotFound exception when using libhdfs3.so to read hdfs file #8289

Open
JkSelf opened this issue Dec 20, 2024 · 0 comments
Open
Labels
bug Something isn't working triage

Comments

@JkSelf
Copy link
Contributor

JkSelf commented Dec 20, 2024

Backend

VL (Velox)

Bug description

Reason: Unable to get file path info for file: hdfs://b49691a74b48.jf.intel.com:8020/tpch_sf3000/lineitem/part-00281-3761d71a-87c6-4341-8f1c-db804f904130-c000.snappy.parquet. got error: FileNotFoundException: Path hdfs://b49691a74b48.jf.intel.com:8020/tpch_sf3000/lineitem/part-00281-3761d71a-87c6-4341-8f1c-db804f904130-c000.snappy.parquet does not exist.
Retriable: False
Context: Split [Hive: hdfs://b49691a74b48.jf.intel.com:8020/tpch_sf3000/lineitem/part-00281-3761d71a-87c6-4341-8f1c-db804f904130-c000.snappy.parquet 0 - 1489456566] Task Gluten_Stage_8_TID_842_VTID_27
Additional Context: Operator: TableScan[0] 0
Function: Impl
File: /home/sparkuser/workspace/workspace/Gluten_TPCH_Spark32_test/ep/build-velox/build/velox_ep/velox/connectors/hive/storage_adapters/hdfs/HdfsReadFile.cpp
Line: 79
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox12HdfsReadFileC2EPNS0_11filesystems5arrow2io8internal11LibHdfsShimEP13hdfs_internalSt17basic_string_viewIcSt11char_traitsIcEE
# 4  _ZN8facebook5velox11filesystems14HdfsFileSystem15openFileForReadESt17basic_string_viewIcSt11char_traitsIcEERKNS1_11FileOptionsE
# 5  _ZN8facebook5velox19FileHandleGeneratorclERKSsPKNS0_14FilePropertiesE
# 6  _ZN8facebook5velox13CachedFactoryISsNS0_10FileHandleENS0_19FileHandleGeneratorENS0_14FilePropertiesENS0_15FileHandleSizerESt8equal_toISsESt4hashISsEE8generateERKSsPKS4_
# 7  _ZN8facebook5velox9connector4hive11SplitReader12createReaderEv
# 8  _ZN8facebook5velox9connector4hive11SplitReader12prepareSplitESt10shared_ptrINS0_6common14MetadataFilterEERNS0_4dwio6common17RuntimeStatisticsE
# 9  _ZN8facebook5velox9connector4hive14HiveDataSource8addSplitESt10shared_ptrINS1_14ConnectorSplitEE
# 10 _ZN8facebook5velox4exec9TableScan9getOutputEv
# 11 _ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv
# 12 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 13 _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 14 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 15 _ZN6gluten24WholeStageResultIterator4nextEv
# 16 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 17 0x00007f4673217427

  at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:41)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
  at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
  at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
  at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
  at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
  at org.apache.gluten.iterator.IteratorsV1$ReadTimeAccumulator.hasNext(IteratorsV1.scala:127)
  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
  at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
  at org.apache.gluten.vectorized.ColumnarBatchInIterator.hasNext(ColumnarBatchInIterator.java:36)
  at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
  at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:57)
  at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:39)
  at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
  at org.apache.gluten.iterator.IteratorsV1$ReadTimeAccumulator.hasNext(IteratorsV1.scala:127)
  at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
  at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
  at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:126)
  at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:256)
  at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
  at org.apache.spark.scheduler.Task.run(Task.scala:131)
  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:750)

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant