Skip to content

[#1805] feat(spark): Support Spark 4.0.2#2748

Merged
roryqi merged 2 commits into
apache:masterfrom
LuciferYang:spark4-support
May 12, 2026
Merged

[#1805] feat(spark): Support Spark 4.0.2#2748
roryqi merged 2 commits into
apache:masterfrom
LuciferYang:spark4-support

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented May 8, 2026

What changes were proposed in this pull request?

Add client-spark/spark4 module to support Spark 4.0.2.

Module structure:

  • client-spark/spark4 — client main module
  • client-spark/spark4-shaded — shaded release jar
  • integration-test/spark4 — integration tests

Key adaptations:

  • Scala 2.13 + Java 17 (Spark 4 minimum requirements)
  • SLF4J 2.x logging stack (log4j-slf4j2-impl)
  • scala.jdk.javaapi.CollectionConverters replacing deprecated JavaConverters
  • extraJavaTestArgs property in root pom for JDK 17 module opens (shared across modules)
  • CI matrix with per-profile JDK version selection

Relationship to spark3: independent module, no dependency on spark3, allowing Spark 4 API to evolve separately.

Why are the changes needed?

Spark 4.0.2 is the first stable release of Spark 4.x. Uniffle currently has no client support for it.

Built on #1806 (Scala 2.13 build support). Same goal as #1814.

How was this patch tested?

  • Unit tests: mvn test -Pspark4, 34 tests passing
  • Shaded jar builds successfully
  • Integration tests: AQERepartitionTest, MapSideCombineTest

Known limitations

  • client-spark/extension (Scala code) excluded from spark4 profile for now. Scala 2.13 compatibility for that module is follow-up work.

@LuciferYang LuciferYang marked this pull request as draft May 8, 2026 15:16
@LuciferYang LuciferYang changed the title [#1805][part-3] Support Spark 4.0.2 [#1805] feat(spark): Support Spark 4.0.2 May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Test Results

 3 401 files  +  216   3 401 suites  +216   7h 17m 43s ⏱️ + 24m 3s
 1 263 tests +   14   1 252 ✅ +    4  11 💤 +10  0 ❌ ±0 
16 930 runs  +1 096  16 904 ✅ +1 085  26 💤 +11  0 ❌ ±0 

Results for commit 915c429. ± Comparison against base commit 6acfd53.

♻️ This comment has been updated with latest results.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.10%. Comparing base (4637321) to head (edc5c72).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
...org/apache/spark/shuffle/RssSparkShuffleUtils.java 0.00% 2 Missing ⚠️
...va/org/apache/spark/shuffle/SparkVersionUtils.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2748      +/-   ##
============================================
+ Coverage     50.76%   51.10%   +0.33%     
- Complexity     3318     3351      +33     
============================================
  Files           533      533              
  Lines         25808    25987     +179     
  Branches       2354     2375      +21     
============================================
+ Hits          13102    13280     +178     
+ Misses        11855    11845      -10     
- Partials        851      862      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add Spark 4.0.2 support with the following changes:

- New client-spark/spark4 and client-spark/spark4-shaded modules
- New integration-test/spark4 module
- spark4 Maven profile with Scala 2.13, Java 17, Hadoop 3.4.1
- CI workflow entries for unit and integration tests

Key compatibility fixes for Hadoop 3.4.1:
- Replace removed NodeHealthScriptRunner with inline logic
- Add ImpersonationProvider.authorize(UGI, InetAddress) for Hadoop 3.4
- Handle wrapped BindException in JettyServer for Jetty 9.4.53
- Add commons-logging explicit test dependency
- Skip Kerberos tests on JDK 17+ via @DisabledIfSystemProperty

Closes apache#1805
@LuciferYang LuciferYang marked this pull request as ready for review May 10, 2026 17:00
@LuciferYang
Copy link
Copy Markdown
Contributor Author

cc @roryqi

import org.apache.spark.shuffle.api.ShuffleExecutorComponents;
import org.apache.spark.shuffle.sort.io.LocalDiskShuffleExecutorComponents;

public class RssShuffleDataIo implements ShuffleDataIO {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RssShuffleDataIo -> RssShuffleDataIO?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! The existing client-spark/spark3 module already has a class with the same name RssShuffleDataIo, and this class name is also referenced as a string in user-facing places:

  • README.md (user config example)
  • integration-test/spark-common/.../SparkIntegrationTestBase.java
  • integration-test/spark3/.../GetReaderTest.java

To avoid name inconsistency between spark3 and spark4 (and to keep the user-facing config string stable), I'd like to keep the current name in this PR. I'll track a follow-up to rename both spark3 and spark4 together, updating the README and integration tests in one shot. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

import org.apache.spark.package$;
import org.apache.spark.util.VersionUtils;

public class Spark4VersionUtils extends SparkVersionUtils {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is reasonable that a Utils class extends an another Utils class? All of their methods are static methods.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — you're right that a utility class with only static members shouldn't extend another utility class (there's no polymorphism across static methods, so the inheritance is effectively just a namespace alias). I've removed extends SparkVersionUtils from Spark4VersionUtils, qualified the MAJOR_VERSION/MINOR_VERSION references directly, and added a private constructor to prevent instantiation. The test was updated to drop the inherited isSpark2/3/4() assertions (they're already covered by the existing testSparkVersion() case against SparkVersionUtils), keeping testSpark4Version() focused on the Spark4-specific isSparkVersionAtLeast.

I kept Spark3VersionUtils as-is to limit the scope of this PR; the same refactor can be applied there in a follow-up.

…Spark4VersionUtils

Utility classes with only static members should not extend another static-only utility; static methods are not polymorphic, so the inheritance is misleading. Make Spark4VersionUtils standalone, qualify MAJOR_VERSION/MINOR_VERSION via SparkVersionUtils, and add a private constructor to prevent instantiation. Drop the duplicated isSpark2/3/4 assertions from testSpark4Version since testSparkVersion already covers them through SparkVersionUtils directly.
Copy link
Copy Markdown
Contributor

@roryqi roryqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@roryqi roryqi merged commit c18774d into apache:master May 12, 2026
43 checks passed
@LuciferYang
Copy link
Copy Markdown
Contributor Author

Thank you @roryqi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants