[#1805] feat(spark): Support Spark 4.0.2#2748
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2748 +/- ##
============================================
+ Coverage 50.76% 51.10% +0.33%
- Complexity 3318 3351 +33
============================================
Files 533 533
Lines 25808 25987 +179
Branches 2354 2375 +21
============================================
+ Hits 13102 13280 +178
+ Misses 11855 11845 -10
- Partials 851 862 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cf8aa00 to
45deb4a
Compare
Add Spark 4.0.2 support with the following changes: - New client-spark/spark4 and client-spark/spark4-shaded modules - New integration-test/spark4 module - spark4 Maven profile with Scala 2.13, Java 17, Hadoop 3.4.1 - CI workflow entries for unit and integration tests Key compatibility fixes for Hadoop 3.4.1: - Replace removed NodeHealthScriptRunner with inline logic - Add ImpersonationProvider.authorize(UGI, InetAddress) for Hadoop 3.4 - Handle wrapped BindException in JettyServer for Jetty 9.4.53 - Add commons-logging explicit test dependency - Skip Kerberos tests on JDK 17+ via @DisabledIfSystemProperty Closes apache#1805
45deb4a to
c72d2bf
Compare
|
cc @roryqi |
| import org.apache.spark.shuffle.api.ShuffleExecutorComponents; | ||
| import org.apache.spark.shuffle.sort.io.LocalDiskShuffleExecutorComponents; | ||
|
|
||
| public class RssShuffleDataIo implements ShuffleDataIO { |
There was a problem hiding this comment.
RssShuffleDataIo -> RssShuffleDataIO?
There was a problem hiding this comment.
Thanks for the suggestion! The existing client-spark/spark3 module already has a class with the same name RssShuffleDataIo, and this class name is also referenced as a string in user-facing places:
README.md(user config example)integration-test/spark-common/.../SparkIntegrationTestBase.javaintegration-test/spark3/.../GetReaderTest.java
To avoid name inconsistency between spark3 and spark4 (and to keep the user-facing config string stable), I'd like to keep the current name in this PR. I'll track a follow-up to rename both spark3 and spark4 together, updating the README and integration tests in one shot. WDYT?
| import org.apache.spark.package$; | ||
| import org.apache.spark.util.VersionUtils; | ||
|
|
||
| public class Spark4VersionUtils extends SparkVersionUtils { |
There was a problem hiding this comment.
Is reasonable that a Utils class extends an another Utils class? All of their methods are static methods.
There was a problem hiding this comment.
Good point — you're right that a utility class with only static members shouldn't extend another utility class (there's no polymorphism across static methods, so the inheritance is effectively just a namespace alias). I've removed extends SparkVersionUtils from Spark4VersionUtils, qualified the MAJOR_VERSION/MINOR_VERSION references directly, and added a private constructor to prevent instantiation. The test was updated to drop the inherited isSpark2/3/4() assertions (they're already covered by the existing testSparkVersion() case against SparkVersionUtils), keeping testSpark4Version() focused on the Spark4-specific isSparkVersionAtLeast.
I kept Spark3VersionUtils as-is to limit the scope of this PR; the same refactor can be applied there in a follow-up.
…Spark4VersionUtils Utility classes with only static members should not extend another static-only utility; static methods are not polymorphic, so the inheritance is misleading. Make Spark4VersionUtils standalone, qualify MAJOR_VERSION/MINOR_VERSION via SparkVersionUtils, and add a private constructor to prevent instantiation. Drop the duplicated isSpark2/3/4 assertions from testSpark4Version since testSparkVersion already covers them through SparkVersionUtils directly.
|
Thank you @roryqi |
What changes were proposed in this pull request?
Add
client-spark/spark4module to support Spark 4.0.2.Module structure:
client-spark/spark4— client main moduleclient-spark/spark4-shaded— shaded release jarintegration-test/spark4— integration testsKey adaptations:
log4j-slf4j2-impl)scala.jdk.javaapi.CollectionConvertersreplacing deprecatedJavaConvertersextraJavaTestArgsproperty in root pom for JDK 17 module opens (shared across modules)Relationship to spark3: independent module, no dependency on spark3, allowing Spark 4 API to evolve separately.
Why are the changes needed?
Spark 4.0.2 is the first stable release of Spark 4.x. Uniffle currently has no client support for it.
Built on #1806 (Scala 2.13 build support). Same goal as #1814.
How was this patch tested?
mvn test -Pspark4, 34 tests passingKnown limitations
client-spark/extension(Scala code) excluded from spark4 profile for now. Scala 2.13 compatibility for that module is follow-up work.