Spark 3.5.4 #5

ejblanco · 2025-01-07T08:03:52Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

…UtilsSuite from java to scala test sources folder ### What changes were proposed in this pull request? Move the BitmapExpressionUtilsSuite and ExpressionImplUtilsSuite from the Java to the Scala test sources folder where they belong. ### Why are the changes needed? code refactoring ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48657 from yaooqinn/minor. Authored-by: Kent Yao <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit 4de286a) Signed-off-by: Max Gekk <[email protected]>

### What changes were proposed in this pull request? This PR aims to upgrade Jetty to 9.4.56.v20240826. ### Why are the changes needed? To bring the latest bug fixes. ### Does this PR introduce _any_ user-facing change? No behavior change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48684 from dongjoon-hyun/SPARK-50150. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? `deepspeed` is not supported on MacOS ### Why are the changes needed? to fix this on MacOS ``` pip install -U -r dev/requirements.txt ... Collecting deepspeed (from -r dev/requirements.txt (line 69)) Using cached deepspeed-0.10.0.tar.gz (836 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [20 lines of output] Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 34, in <module> File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/setup.py", line 37, in <module> from op_builder.all_ops import ALL_OPS File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/op_builder/all_ops.py", line 29, in <module> builder = get_accelerator().create_op_builder(member_name) File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/accelerator/mps_accelerator.py", line 211, in create_op_builder builder_class = self.get_op_builder(op_name) File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/accelerator/mps_accelerator.py", line 218, in get_op_builder from deepspeed.ops.op_builder.cpu import NotImplementedBuilder File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/deepspeed/__init__.py", line 21, in <module> from . import ops File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/deepspeed/ops/__init__.py", line 6, in <module> from . import adam File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/deepspeed/ops/adam/__init__.py", line 6, in <module> from .cpu_adam import DeepSpeedCPUAdam File "/private/var/folders/l_/b6xgqlvx0895dljz46x9nl780000gp/T/pip-install-zd43o1nk/deepspeed_47bbb5784bc942e6bdf0f5ec24e9f939/deepspeed/ops/adam/cpu_adam.py", line 7, in <module> from cpuinfo import get_cpu_info ModuleNotFoundError: No module named 'cpuinfo' [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? manually check Closes apache#42411 from zhengruifeng/install_deepspeed_on_linux. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 5c94565) Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? The pr aims to move `scala` and `java` files to their default folders (`src/main/scala` and `src/main/java`), includes: - `ByteArrayUtils.java` from: `common/utils/src/main/scala/org/apache/spark/unsafe/array/ByteArrayUtils.java` to: `common/utils/src/main/java/org/apache/spark/unsafe/array/ByteArrayUtils.java` - `CustomDecimal.scala` from: `connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala` to: `connector/avro/src/main/scala/org/apache/spark/sql/avro/CustomDecimal.scala` PS: The pr is backport branch-3.5, master pr is: apache#48691 Note: In branch-3.5, the `OrcCompressionCodec.java` file does not exist, so there is no need to move it. ### Why are the changes needed? Move `scala` and `java` files to the default folder of the project to avoid `misunderstandings` for spark developers. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48700 from panbingkun/SPARK-50155_branch3.5. Authored-by: panbingkun <[email protected]> Signed-off-by: Max Gekk <[email protected]>

… closed ### What changes were proposed in this pull request? Disallow cursors from reattaching corresponding ExecuteHolders after the session is closed. In order to prevent a session with a long-running query from being closed, the session is always accessed when reattaching. apache#44670 resolves this issue in Spark 4.0.0. ### Why are the changes needed? SPARK-50176. Sessions with long running queries are susceptible to cache eviction, causing trouble when the client tries to reattach to the execution. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? org.apache.spark.sql.connect.execution.ReattachableExecuteSuite ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48725 from changgyoopark-db/SPARK-50176-3.5. Authored-by: changgyoopark-db <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…p.name` to `SparkSubmit` properly This PR aims to fix `StandaloneRestServer` to propagate `spark.app.name` to `SparkSubmit` properly. This is a long-standing bug which PySpark job didn't get a proper `spark.app.name` propagation unlike Scala/Java Spark jobs. Since PySpark jobs are invoked indirectly via `SparkSubmit`, we need to hand over `spark.app.name` via `-c` configuration. This is a bug fix. The new behavior is the expected bahavior. Pass the CIs with the newly added test case. No. Closes apache#48729 from dongjoon-hyun/SPARK-50195. Lead-authored-by: Dongjoon Hyun <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit ce89940) Signed-off-by: Dongjoon Hyun <[email protected]>

…t_install_spark` ### What changes were proposed in this pull request? This PR aims to use Spark 3.4.4 instead of 3.0.1 in `test_install_spark`. Since Spark 3.4.4 is the End-Of-Life release, it will be in `dlcdc`, `archive`, and `dist` channel until Apache Spark 4.0 release. Previously, 3.0.1 exists only in `archive` and causes flaky failures. ### Why are the changes needed? To reduce the flakiness. - https://github.com/apache/spark/actions/runs/11623974780/job/32371883850 ``` urllib.error.URLError: <urlopen error [Errno 110] Connection timed out> ERROR test_package_name (pyspark.tests.test_install_spark.SparkInstallationTestCase) ... Trying to download Spark spark-3.0.1 from [https://dlcdn.apache.org/, https://archive.apache.org/dist, https://dist.apache.org/repos/dist/release] Downloading spark-3.0.1 for Hadoop hadoop3.2 from: - https://dlcdn.apache.org//spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz Failed to download spark-3.0.1 for Hadoop hadoop3.2 from https://dlcdn.apache.org//spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz: Downloading spark-3.0.1 for Hadoop hadoop3.2 from: - https://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz Failed to download spark-3.0.1 for Hadoop hadoop3.2 from https://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz: Downloading spark-3.0.1 for Hadoop hadoop3.2 from: - https://dist.apache.org/repos/dist/release/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz Failed to download spark-3.0.1 for Hadoop hadoop3.2 from https://dist.apache.org/repos/dist/release/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz: ok ``` **AFTER** ``` test_install_spark (pyspark.tests.test_install_spark.SparkInstallationTestCase) ... Trying to download Spark spark-3.4.4 from [https://dlcdn.apache.org/, https://archive.apache.org/dist, https://dist.apache.org/repos/dist/release] Downloading spark-3.4.4 for Hadoop hadoop3 from: - https://dlcdn.apache.org//spark/spark-3.4.4/spark-3.4.4-bin-hadoop3.tgz Downloaded 1048576 of 388988563 bytes (0.27%) ... ``` Since Spark 3.4.4 is the EOL version, it will be in `download.apache.org` until Apache Spark 4.0.0 release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48733 from dongjoon-hyun/SPARK-50199. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit fcfbf8e) Signed-off-by: Dongjoon Hyun <[email protected]>

…ilure ### What changes were proposed in this pull request? apache#48725 closes a session completely during ReattachableExecuteSuite causing 'sleep' to be unavailable in subsequent test cases. apache#43546 fixes the issue by re-creating the 'sleep' udf in each test case needing the udf, and this PR back-ports part of it. ### Why are the changes needed? In order to make the test green again. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48745 from changgyoopark-db/SPARK-50176-3.5. Authored-by: Changgyoo Park <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? Documentation change: the JAVA versions mentioned on the getting_started page of PySpark 3.5 are corrected (https://spark.apache.org/docs/3.5.3/api/python/getting_started/install.html#dependencies). ### Why are the changes needed? The original description "PySpark requires Java 8 or later" is incorrect since 3.5 does not support java prior to 8u371 anymore and the latest supported version is 17, the downloading page (https://spark.apache.org/docs/3.5.3/#downloading) however, does correctly state this. I thus corrected the mentioned java versions. ### Does this PR introduce _any_ user-facing change? Yes documentation fix ### How was this patch tested? Manually ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48411 from dvorst/branch-3.5. Authored-by: d.vorst <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…e properly ### What changes were proposed in this pull request? This PR aims to fix `SparkSubmit` to show `REST API` `kill` response properly. ### Why are the changes needed? **PREPARE SPARK CLUSTER** ``` $ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh $ sbin/start-worker.sh spark://$(hostname):7077 ``` **BEFORE (4.0.0-preview2)** `spark-submit` didn't show error messages properly. ``` $ bin/spark-submit --master spark://$(hostname):6066 --kill invalid-submission-id ``` **AFTER** ``` $ bin/spark-submit --master spark://$(hostname):6066 --kill invalid-submission-id Error: Driver invalid-submission-id has already finished or does not exist ``` ``` $ sh examples/src/main/scripts/submit-pi.sh { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20241102232042-0000", "serverSparkVersion" : "4.0.0-SNAPSHOT", "submissionId" : "driver-20241102232042-0000", "success" : true }% $ bin/spark-submit --master spark://$(hostname):6066 --kill driver-20241102232042-0000 driver-20241102232042-0000 is killed successfully. ``` ### Does this PR introduce _any_ user-facing change? Yes, but this logs show additional log messages. ### How was this patch tested? Manual tests because this requires a Spark Standalone Cluster and the difference is only a log of `spark-submit`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48742 from dongjoon-hyun/SPARK-50210. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: yangjie01 <[email protected]> (cherry picked from commit 9cf98ed) Signed-off-by: Dongjoon Hyun <[email protected]>

… `build` task in `build_and_test.yml` ### What changes were proposed in this pull request? Comparing https://github.com/apache/spark/blob/c53dac05058c48ae1edad7912e8cc82533839ca0/.github/workflows/build_and_test.yml#L102 and https://github.com/apache/spark/blob/9d472661daad4703628e9fbf0ba9922abeed7354/.github/workflows/build_and_test.yml#L97, the master branch uses `dev/is-changed.py` to check for changes in additional modules: `variant`, `api`, `streaming-kinesis-asl`, `protobuf`, and `connect`. Among these, the `api`, `protobuf`, and `connect` modules also exist in the `branch-3.5` and should be checked as well. Therefore, this pr includes the following changes: 1. Adds `is-changed` checks for the `api`, `protobuf`, and `connect` modules. 2. In `dev/sparktestsupport/modules.py`, adds the definition for the `api` module, aligning with the master branch. 3. In `dev/sparktestsupport/modules.py`, adds the definition for the `utils` module, aligning with the master branch. Prior to this PR, although `dev/is-changed.py` was used to check for changes in the `utils` module, its definition was missing from `dev/sparktestsupport/modules.py`. ### Why are the changes needed? Fix the conditional check for executing the `build` task in `build_and_test.yml` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manually verified the effectiveness of this pull request: Before: Set the latest commit by executing `export APACHE_SPARK_REF=9d472661daad4703628e9fbf0ba9922abeed7354`. Then manually edit files in the `api`, `protobuf`, or `connect` modules and commit them. Run the following command: ``` ./dev/is-changed.py -m "core,unsafe,kvstore,avro,utils,network-common,network-shuffle,repl,launcher,examples,sketch,graphx,catalyst,hive-thriftserver,streaming,sql-kafka-0-10,streaming-kafka-0-10,mllib-local,mllib,yarn,mesos,kubernetes,hadoop-cloud,spark-ganglia-lgpl,sql,hive,connect,protobuf,api" ``` The console will print `false`. After: Set the latest commit by executing `export APACHE_SPARK_REF=41446b3d98cfccf5c6f6ddb8bc3c7c6c1b1c3f54`. Then manually edit files in the `api`, `protobuf`, or `connect` modules and commit them. Run the same command as before: ``` ./dev/is-changed.py -m "core,unsafe,kvstore,avro,utils,network-common,network-shuffle,repl,launcher,examples,sketch,graphx,catalyst,hive-thriftserver,streaming,sql-kafka-0-10,streaming-kafka-0-10,mllib-local,mllib,yarn,mesos,kubernetes,hadoop-cloud,spark-ganglia-lgpl,sql,hive,connect,protobuf,api" ``` The console will now print `true`. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48744 from LuciferYang/is-change-3.5. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>

…l rows in ColumnarToRowExec ### What changes were proposed in this pull request? This patch cleans up ColumnVector resource after processing all rows in ColumnarToRowExec. This patch only focus on codeben implementation of ColumnarToRowExec. For non-codegen, it should be relatively rare to use, and currently no good way has proposed, so leaving it to a follow up. ### Why are the changes needed? Currently we only assign null to ColumnarBatch object but it doesn't release the resources hold by the vectors in the batch. For OnHeapColumnVector, the Java arrays may be automatically collected by JVM, but for OffHeapColumnVector, the allocated off-heap memory will be leaked. For custom ColumnVector implementations like Arrow-based, it also possibly causes issues on memory safety if the underlying buffers are reused across batches. Because when ColumnarToRowExec begins to fill values for next batch, the arrays in previous batch are still hold. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48767 from viirya/close_if_not_writable. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 800faf0) Signed-off-by: Kent Yao <[email protected]>

### What changes were proposed in this pull request? This PR aims to use `mirror host` instead of `archive.apache.org`. ### Why are the changes needed? Currently, Apache Spark CI is flaky due to the checksum download failure like the following. It took over 9 minutes and failed eventually. - https://github.com/apache/spark/actions/runs/11818847971/job/32927380452 - https://github.com/apache/spark/actions/runs/11818847971/job/32927382179 ``` exec: curl --retry 3 --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz?action=download exec: curl --retry 3 --silent --show-error -L https://archive.apache.org/dist/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz.sha512 curl: (28) Failed to connect to archive.apache.org port 443 after 135199 ms: Connection timed out curl: (28) Failed to connect to archive.apache.org port 443 after 134166 ms: Connection timed out curl: (28) Failed to connect to archive.apache.org port 443 after 135213 ms: Connection timed out curl: (28) Failed to connect to archive.apache.org port 443 after 135260 ms: Connection timed out Verifying checksum from /home/runner/work/spark/spark/build/apache-maven-3.9.9-bin.tar.gz.sha512 shasum: /home/runner/work/spark/spark/build/apache-maven-3.9.9-bin.tar.gz.sha512: no properly formatted SHA checksum lines found Bad checksum from https://archive.apache.org/dist/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz.sha512 Error: Process completed with exit code 2. ``` **BEFORE** ``` $ build/mvn clean exec: curl --retry 3 --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz?action=download exec: curl --retry 3 --silent --show-error -L https://archive.apache.org/dist/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz.sha512 ``` **AFTER** ``` $ build/mvn clean exec: curl --retry 3 --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz?action=download exec: curl --retry 3 --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.9/binaries/apache-maven-3.9.9-bin.tar.gz.sha512?action=download ``` ### Does this PR introduce _any_ user-facing change? No, this is a dev-only change. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48836 from dongjoon-hyun/SPARK-50300. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 5cc60f4) Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This PR aims to remove `(any|empty).proto` from RAT exclusion. ### Why are the changes needed? `(any|empty).proto` files were never a part of Apache Spark repository. Those files were only used in the initial `Connect` PR and removed before merging. - apache#37710 - Added: apache@45c7bc5 - Excluded from RAT check: apache@cf6b19a - Removed: apache@4971980 ### Does this PR introduce _any_ user-facing change? No. This is a dev-only change. ### How was this patch tested? Pass the CIs or manual check. ``` $ ./dev/check-license Ignored 0 lines in your exclusion files as comments or empty lines. RAT checks passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48837 from dongjoon-hyun/SPARK-50304. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 33378a6) Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This PR aims to upgrade ORC to 1.9.5 for Apache Spark 3.5.4. ### Why are the changes needed? To bring the latest bug fix: - https://orc.apache.org/news/2024/11/14/ORC-1.9.5/ - apache/orc#1960 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48845 from dongjoon-hyun/SPARK-50316. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…rror when kerberos is true ### What changes were proposed in this pull request? When kerberos is enabled and SparkThriftServer is started, the service delivery parameters keytab and principal are created when hadoop authentication errors occur `saslServer = ShimLoader.getHadoopThriftAuthBridge().createServer(principal, keytab);` `public Server createServer(String keytabFile, String principalConf) throws TTransportException { return new Server(keytabFile, principalConf); }` ### Why are the changes needed? Failed to start SparkThriftServer when kerberos is true ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? verified ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48855 from CuiYanxiang/SPARK-50312. Authored-by: cuiyanxiang <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 3237885) Signed-off-by: Kent Yao <[email protected]>

### What changes were proposed in this pull request? This PR fixes the below HTML/Markdown syntax error in sql-migration-guide.md ![image](https://github.com/user-attachments/assets/bb62a240-1ee5-4763-92c2-97fdd5436284) ### Why are the changes needed? docfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ![image](https://github.com/user-attachments/assets/95b83aa0-beb1-418c-be08-02310010f4d8) ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#48899 from yaooqinn/minor. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit b582dac) Signed-off-by: Kent Yao <[email protected]>

…timization The root cause of this issue is the planner turns `Limit` + `Sort` into `TakeOrderedAndProjectExec` which adds an additional `Project` that does not exist in the logical plan. We shouldn't use this additional `Project` to optimize out other `Project`s, otherwise when AQE turns physical plan back to logical plan, we lose the `Project` and may mess up the output column order. This PR makes it does not remove redundant projects if AEQ is enabled and projectList is the same as child output in `TakeOrderedAndProjectExec`. Fix potential data issue and avoid Spark Driver crash: ``` ... ``` No. Unit test. No. Closes apache#48789 from wangyum/SPARK-50258. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6ee53da) Signed-off-by: Wenchen Fan <[email protected]>

…al clone ### What changes were proposed in this pull request? This PR proposes to use the standard Properties.clone instead of manual clone ### Why are the changes needed? In a very rare condition, when the properties were changed during the clone of Properties, it might throw an exception as below: ``` : java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1408) at java.util.Hashtable.putAll(Hashtable.java:523) at org.apache.spark.util.Utils$.cloneProperties(Utils.scala:3474) at org.apache.spark.SparkContext.getCredentialResolvedProperties(SparkContext.scala:523) at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3157) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1104) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:454) at org.apache.spark.rdd.RDD.collect(RDD.scala:1102) at org.apache.spark.mllib.evaluation.AreaUnderCurve$.of(AreaUnderCurve.scala:44) at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:127) at org.apache.spark.ml.evaluation.BinaryClassificationEvaluator.evaluate(BinaryClassificationEvaluator.scala:101) at sun.reflect.GeneratedMethodAccessor323.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397) at py4j.Gateway.invoke(Gateway.java:306) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199) at py4j.ClientServerConnection.run(ClientServerConnection.java:119) at java.lang.Thread.run(Thread.java:750) ``` We should use the standard clone method. ### Does this PR introduce _any_ user-facing change? It fixes a very corner case bug as described above. ### How was this patch tested? It's difficult to test because the issue is from concurrent execution. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48978 from HyukjinKwon/SPARK-50430. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 7614819) Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? The pr aims to add the style for `shuffle-write-time-checkbox-div` and set the width to be `155` pixels. ### Why are the changes needed? Fix bug for UI. The tip of `shuffle-write-time` appears in an strange position before this change. As shown below ![MEITU_20240819_105642523](https://github.com/user-attachments/assets/1e4e9639-a949-4fc3-86f4-7cb65d6d9c73) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47798 from xunxunmimi5577/add-width-style-for-shuffle_write_time-checkbox. Authored-by: xunxunmimi5577 <[email protected]> Signed-off-by: panbingkun <[email protected]> (cherry picked from commit 05728e4) Signed-off-by: panbingkun <[email protected]>

… Spark on YARN and UT Backport apache#48981 to 3.5 ### What changes were proposed in this pull request? As title. ### Why are the changes needed? SPARK-37814 (3.3.0) migrated logging system from log4j1 to log4j2, we should updated the docs as well. ### Does this PR introduce _any_ user-facing change? Yes, docs are updated. ### How was this patch tested? Review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49044 from pan3793/SPARK-50433-3.5. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? Update broken jira link ### Why are the changes needed? The old link is not accessible ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No testing required ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49052 from huangxiaopingRD/SPARK-50487. Lead-authored-by: huangxiaoping <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 3d063a0) Signed-off-by: Kent Yao <[email protected]>

### What changes were proposed in this pull request? Avoid unnecessary py4j call in `listFunctions` ### Why are the changes needed? ``` iter = self._jcatalog.listFunctions(dbName).toLocalIterator() if pattern is None: iter = self._jcatalog.listFunctions(dbName).toLocalIterator() else: iter = self._jcatalog.listFunctions(dbName, pattern).toLocalIterator() ``` the first `self._jcatalog.listFunctions` is unnecessary ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#49073 from zhengruifeng/avoid_list_funcs. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 3628595) Signed-off-by: Dongjoon Hyun <[email protected]>

… value documentation ### What changes were proposed in this pull request? This PR aims to fix `spark.storage.replication.proactive` default value documentation. ### Why are the changes needed? `spark.storage.replication.proactive` has been enabled by default since Apache Spark 3.2.0. https://github.com/apache/spark/blob/6add9c89855f9311d5e185774ddddcbf4323beee/docs/core-migration-guide.md?plain=1#L85 https://github.com/apache/spark/blob/6add9c89855f9311d5e185774ddddcbf4323beee/core/src/main/scala/org/apache/spark/internal/config/package.scala#L494-L502 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49081 from dongjoon-hyun/SPARK-50505. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 21451fb) Signed-off-by: Dongjoon Hyun <[email protected]>

…e column is dropped after dropDuplicatesWithinWatermark ### What changes were proposed in this pull request? Update `DeduplicateWithinWatermark` references to include all attributes that could be the watermarking column. ### Why are the changes needed? Fix `java.util.NoSuchElementException` due to ColumnPruning. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49065 from liviazhu-db/liviazhu-db/dedup-watermark-fix. Authored-by: Livia Zhu <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]> (cherry picked from commit 851f5f2) Signed-off-by: Jungtaek Lim <[email protected]>

… references` in `DeduplicateWithinWatermark` to fix the compilation issue ### What changes were proposed in this pull request? This pr change `def references` to `lazy val references` in `DeduplicateWithinWatermark` to fix the following compilation error： - https://github.com/apache/spark/actions/runs/12191807324/job/34011354774 ``` [error] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:1948:16: overriding lazy value references in class QueryPlan of type org.apache.spark.sql.catalyst.expressions.AttributeSet; [error] method references needs to be a stable, immutable value [error] override def references: AttributeSet = AttributeSet(keys) ++ [error] ^ [error] one error found ``` ### Why are the changes needed? Fix compile error. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49087 from LuciferYang/SPARK-50492-FOLLOWUP-3.5. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>

… when multiple resource profiles worked ### What changes were proposed in this pull request? Reset the executor's env memory related config when resource profile is not as the default resource profile! ### Why are the changes needed? When multiple resource profile exists in the same spark application, now the executor's memory related config is not override by resource profile's memory size, which will cause maxOffHeap in `UnifiedMemoryManager` is not correct. See https://issues.apache.org/jira/browse/SPARK-50421 for more details ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests in our inner spark version and jobs. ### Was this patch authored or co-authored using generative AI tooling? No This is a backporting from apache#48963 to branch 3.5 Closes apache#49090 from zjuwangg/m35_fixConfig. Authored-by: Terry Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This PR aims to add `IDENTIFIER clause` page to `menu-sql.yaml` for Apache Spark 3.5.4. ### Why are the changes needed? This was missed at SPARK-43205 (Apache Spark 3.5.0). - apache#42506 ### Does this PR introduce _any_ user-facing change? **BEFORE** ![Screenshot 2024-12-06 at 11 35 52](https://github.com/user-attachments/assets/c3c8dc56-b8d4-4f8d-bb9e-31bccb1f5d42) **AFTER** ![Screenshot 2024-12-06 at 11 36 14](https://github.com/user-attachments/assets/bd1606d2-eb3f-4640-92ef-b0079847c3a3) ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49097 from dongjoon-hyun/SPARK-50514. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: yangjie01 <[email protected]> (cherry picked from commit 28766d4) Signed-off-by: yangjie01 <[email protected]>

### What changes were proposed in this pull request? Backport of the apache#48144 This PR fixes the pushdown of ^ operator (XOR operator) for Postgres. Those two databases use this as exponent, rather then bitwise xor. Fix is consisted of overriding the SQLExpressionBuilder to replace the '^' character with '#'. ### Why are the changes needed? Result is incorrect. ### Does this PR introduce _any_ user-facing change? Yes. The user will now have a proper translation of the ^ operator. ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49071 from andrej-db/PGXORBackport. Lead-authored-by: Andrej Gobeljić <[email protected]> Co-authored-by: andrej-gobeljic_data <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…even if ignoreCorruptFiles is enabled ### What changes were proposed in this pull request? `BlockMissingException` extends from `IOException`. When `BlockMissingException` occurs and ignoreCorruptFiles is enabled, the current task may not get any data and will be marked as successful([code](https://github.com/apache/spark/blob/0d045db8d15d0aeb0f54a1557fd360363e77ed42/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L271-L273)). This will cause data quality issues. Generally speaking, `BlockMissingException` is a system issue, not a file corruption issue. Therefore, `BlockMissingException` should be thrown even if ignoreCorruptFiles is enabled. Related error message: ``` 24/11/29 01:56:00 WARN FileScanRDD: Skipped the rest of the content in the corrupted file: path: viewfs://hadoop-cluster/path/to/data/part-00320-7915e327-3214-4585-a44e-f9c58e362b43.c000.snappy.parquet, range: 191727616-281354675, partition values: [empty row] org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-169998034-10.210.23.11-1507067630530:blk_83565156183_82548880660 file/path/to/data/part-00320-7915e327-3214-4585-a44e-f9c58e362b43.c000.snappy.parquet No live nodes contain current block Block locations: DatanodeInfoWithStorage[10.209.145.174:50010,DS-c7c0a172-5ffa-4f90-bfb5-717fb1e9ecf2,DISK] DatanodeInfoWithStorage[10.3.22.142:50010,DS-a1ba9ac9-dc92-4131-a2c2-9f7d03b97caf,DISK] DatanodeInfoWithStorage[10.209.146.156:50010,DS-71d8ae97-15d3-454e-a715-d9490e184989,DISK] Dead nodes: DatanodeInfoWithStorage[10.209.146.156:50010,DS-71d8ae97-15d3-454e-a715-d9490e184989,DISK] DatanodeInfoWithStorage[10.209.145.174:50010,DS-c7c0a172-5ffa-4f90-bfb5-717fb1e9ecf2,DISK] DatanodeInfoWithStorage[10.3.22.142:50010,DS-a1ba9ac9-dc92-4131-a2c2-9f7d03b97caf,DISK] ``` ![image](https://github.com/user-attachments/assets/e040ce9d-1a0e-44eb-bd03-4cd7a9fff80f) ### Why are the changes needed? Avoid data issue if ignoreCorruptFiles is enabled when `BlockMissingException` occurred. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Manual test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49105 from wangyum/SPARK-50483-branch-3.5. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…xus staging repository ### What changes were proposed in this pull request? This PR improves `dev/create-release/release-build.sh` by enabling 3-times retry for deploying artifacts to the Nexus staging repository When I was setting up 3.5.2-rc5 on my AWS EC2 instance, I encountered an issue with closing the `orgapachespark-1461` due to a timeout while uploading a sha1 file. ```xml Uploading spark-streaming-kafka-0-10_2.13/3.5.2/spark-streaming-kafka-0-10_2.13-3.5.2-test-sources.jar.sha1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 262 100 221 100 41 15 2 0:00:20 0:00:13 0:00:07 58 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>408 Request Timeout</title> </head><body> <h1>Request Timeout</h1> <p>Server timeout waiting for the HTTP request from the client.</p> </body></html> ``` I might choose to upload it manually but I didn't, because I was afraid of making some unpredictable errors. So I regenerated and uploaded `orgapachespark-1462`. ### Why are the changes needed? To avoid temporary network errors when performing the publish step for release managers. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#49108 from LuciferYang/SPARK-49134-3.5. Authored-by: Kent Yao <[email protected]> Signed-off-by: yangjie01 <[email protected]>

…ow conversion ### What changes were proposed in this pull request? apache@800faf0 frees column vector resources between batches in columnar to row conversion. However, like `WritableColumnVector`, `ConstantColumnVector` should not free resources between batches because the same data is used across batches ### Why are the changes needed? Without this change, ConstantColumnVectors with string values, for example, will fail if used with column->row conversion. For instance, reading a parquet table partitioned by a string column with multiple batches. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT that failed before and now passes ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#49131 from LuciferYang/SPARK-50463-3.5. Authored-by: Richard Chen <[email protected]> Signed-off-by: yangjie01 <[email protected]>

…lure ### What changes were proposed in this pull request? ReattachableExecuteSuite detected a rare data race issue where ExecuteThreadRunner may send the client the wrong error code before the SparkConnect service sends the correct error code. - The test fails if ExecuteThreadRunner is finished before the SparkConnect service sends the correct error code and after the session is invalidated; to be specific, the event manager throws an illegal state exception (SPARK-49688) that is translated into an unknown error. - The whole problem was addressed under apache#48208 for Spark 4.0. ### Why are the changes needed? 1. Clients may get the wrong error message: expect session-closed or the like, but get unknown. 2. To fix the ReattachableExecuteSuite failure. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ReattachableExecuteSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49127 from changgyoopark-db/SPARK-50510. Authored-by: changgyoopark-db <[email protected]> Signed-off-by: yangjie01 <[email protected]>

…E WHEN for MsSqlServer and future connectors ### What changes were proposed in this pull request? This PR proposes to propagate the `isPredicate` info in `V2ExpressionBuilder` and wrap the children of CASE WHEN expression (only `Predicate`s) with `IIF(<>, 1, 0)` for MsSqlServer. This is done to force returning an int instead of a boolean, as SqlServer cannot handle boolean expressions as a return type in CASE WHEN. E.g. ```CASE WHEN ... ELSE a = b END``` Old behavior: ```CASE WHEN ... ELSE a = b END = 1``` New behavior: Since in SqlServer a `= 1` is appended to the CASE WHEN, THEN and ELSE blocks must return an int. Therefore the final expression becomes: ```CASE WHEN ... ELSE IIF(a = b, 1, 0) END = 1``` ### Why are the changes needed? A user cannot work with an MsSqlServer data with CASE WHEN clauses or IF clauses if they wish to return a boolean value. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added tests to MsSqlServerIntegrationSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49115 from andrej-db/CASEWHENBackport. Lead-authored-by: andrej-gobeljic_data <[email protected]> Co-authored-by: Wenchen Fan <[email protected]> Co-authored-by: Andrej Gobeljić <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…wn even if `ignoreCorruptFiles` is enabled Cherry-pick apache#49143 to branch-3.5 ### What changes were proposed in this pull request? `AccessControlException` extends `IOException` but we should not treat it as a data corruption issue. This is similar to SPARK-50483 which handles `BlockMissingException` in the same way. ``` 2024-12-11 06:29:05 WARN HadoopRDD: Skipped the rest content in the corrupted file: hdfs://hadoop-master1.orb.local:8020/warehouse/region/part-00000-2dc8a6f6-8cea-4652-8ba1-762c1b65e2b4-c000:192+192 org.apache.hadoop.security.AccessControlException: Permission denied: user=hive, access=READ, inode="/warehouse/region/part-00000-2dc8a6f6-8cea-4652-8ba1-762c1b65e2b4-c000":kyuubi.hadoop:hadoop:-rw------- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:506) ``` <img width="1462" alt="image" src="https://github.com/user-attachments/assets/d3a64578-90c6-49bb-b92f-7c5c71451a9b"> ### Why are the changes needed? Avoid data issue if `ignoreCorruptFiles` is enabled when `AccessControlException` occurred. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Manual test. Task fails with `org.apache.hadoop.security.AccessControlException` even with `spark.sql.files.ignoreCorruptFiles=true` and `spark.files.ignoreCorruptFiles=true` <img width="1477" alt="image" src="https://github.com/user-attachments/assets/373ad5fc-15f5-486f-aba3-53b7f7af3b13"> ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49162 from pan3793/SPARK-50545-3.5. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]>

…nd values in Properties This PR proposes to actually more conservatively preserve the original code of creating new properties instead of cloning. Previous codes only copied the key and values but `clone` actually copies more fields in `Properties`. `cloneProperties` is being used in Spark Core, and all other components so I propose to keep the logic as is. This is more a fix of a potential bug. No, it is difficult to add a test. No. Closes apache#49036 from HyukjinKwon/SPARK-50430-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 4abaab3) Signed-off-by: Hyukjin Kwon <[email protected]>

…ng key and values in Properties" This reverts commit 8168ea8.

… of manual clone" This reverts commit 5ff129a.

…ll-errors` from `release-build.sh` ### What changes were proposed in this pull request? This pr aims to remove unsupported `curl` option `--retry-all-errors` from branch-3.5's `release-build.sh` ### Why are the changes needed? branch-3.5 uses Ubuntu 20.04 for release, and the `curl` installed via `apt-get install` on Ubuntu 20.04 does not yet support `--retry-all-errors`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual tested ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49201 from LuciferYang/SPARK-50587. Authored-by: yangjie01 <[email protected]> Signed-off-by: yangjie01 <[email protected]>

### What changes were proposed in this pull request? Simplify org.apache.spark.sql.connect.execution.ReattachableExecuteSuite."reattach after connection expired" to make it more deterministic. ### Why are the changes needed? The test previously involved execution and interruption that made the test unnecessarily flaky, e.g., an exception was thrown when releasing the corresponding [execution](https://github.com/apache/spark/actions/runs/12296721038/job/34316344940), not when reattaching the execution. - The test's sole purpose is to check whether the lack of 'session' results in the correct error code. - The involvement of actual query execution only makes the test flaky and complicated. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Repeatedly ran testOnly org.apache.spark.sql.connect.execution.ReattachableExecuteSuite. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49203 from changgyoopark-db/SPARK-50510. Authored-by: changgyoopark-db <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

### What changes were proposed in this pull request? A few minor changes to clarify (and fix one typo) in the comments for watermark propagation in Structured Streaming. ### Why are the changes needed? I found some of the terminology around "simulation" confusing, and the current comment describes incorrect logic for output watermark calculation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#49188 from neilramaswamy/nr/minor-wm-prop. Authored-by: Neil Ramaswamy <[email protected]> Signed-off-by: Jungtaek Lim <[email protected]> (cherry picked from commit 2b41131) Signed-off-by: Jungtaek Lim <[email protected]>

…es in migration guide Backport apache#49252 to branch-3.5 ### What changes were proposed in this pull request? Update migration guide for SPARK-50483 and SPARK-50545 ### Why are the changes needed? Mention behavior changes in migration guide ### Does this PR introduce _any_ user-facing change? Yes, docs are updated. ### How was this patch tested? Review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49256 from pan3793/SPARK-50483-SPARK-50545-followup-3.5. Authored-by: Cheng Pan <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

yaooqinn and others added 30 commits October 25, 2024 18:10

LuciferYang and others added 19 commits December 9, 2024 01:32

Preparing Spark release v3.5.4-rc1

929a19f

Preparing development version 3.5.5-SNAPSHOT

8e6507a

Preparing Spark release v3.5.4-rc2

91af6f9

Preparing development version 3.5.5-SNAPSHOT

a764524

Revert "[SPARK-50430][CORE][FOLLOW-UP] Keep the logic of manual putti…

0fbe292

…ng key and values in Properties" This reverts commit 8168ea8.

Revert "[SPARK-50430][CORE] Use the standard Properties.clone instead…

f7c48fe

… of manual clone" This reverts commit 5ff129a.

Preparing Spark release v3.5.4-rc3

a6f220d

Preparing development version 3.5.5-SNAPSHOT

bcaa5a9

Merge remote-tracking branch 'spark/branch-3.5' into spark-3.5

fc601e2

ejblanco closed this Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.5.4 #5

Spark 3.5.4 #5

ejblanco commented Jan 7, 2025

Spark 3.5.4 #5

Spark 3.5.4 #5

Conversation

ejblanco commented Jan 7, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?