Skip to content

Add Hudi MOR snapshot query and metadata table support#25599

Closed
codope wants to merge 1 commit into
trinodb:masterfrom
codope:hudi-mor-snapshot
Closed

Add Hudi MOR snapshot query and metadata table support#25599
codope wants to merge 1 commit into
trinodb:masterfrom
codope:hudi-mor-snapshot

Conversation

@codope
Copy link
Copy Markdown
Contributor

@codope codope commented Apr 16, 2025

Description

  • Re-enable metadata table support - bring back the defunct config.
  • Add support for reading log files to support snapshot querying of Hudi Merge-on-Read (MOR) tables.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Hudi
* Support for snapshot query of Hudi Merge-on-Read (MOR) tables.
* Support for Hudi metadata table.

@cla-bot cla-bot Bot added the cla-signed label Apr 16, 2025
@codope codope requested a review from ebyhr April 16, 2025 12:52
@github-actions github-actions Bot added hudi Hudi connector hive Hive connector labels Apr 16, 2025
Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix CI failures.

Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSessionProperties.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSource.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSourceProvider.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSourceProvider.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSplit.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/file/HudiFile.java Outdated
Comment thread plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiSmokeTest.java Outdated
@codope codope force-pushed the hudi-mor-snapshot branch from 0cfa118 to 941d35d Compare April 17, 2025 13:55
@github-actions github-actions Bot added the docs label Apr 17, 2025
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 21, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@cla-bot cla-bot Bot removed the cla-signed label Apr 21, 2025
@codope codope force-pushed the hudi-mor-snapshot branch from 0e495c0 to 935c8c0 Compare April 21, 2025 11:36
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Apr 21, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@codope codope force-pushed the hudi-mor-snapshot branch from 935c8c0 to d9200d3 Compare April 21, 2025 15:49
@cla-bot cla-bot Bot added the cla-signed label Apr 21, 2025
@codope codope force-pushed the hudi-mor-snapshot branch from d9200d3 to b5bbebd Compare April 23, 2025 15:52
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/util/HudiAvroSerializer.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/util/HudiAvroSerializer.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiUtil.java Outdated
@codope codope force-pushed the hudi-mor-snapshot branch 3 times, most recently from e4e5630 to a060c18 Compare May 2, 2025 05:20
@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 2, 2025

@ebyhr @mxmarkovics Could you please take a final pass and help in merging this PR? I have two other PRs stackedon top of this one.

@codope codope force-pushed the hudi-mor-snapshot branch 2 times, most recently from ce716a4 to 33f70f9 Compare May 7, 2025 07:10
@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 7, 2025

@ebyhr @mxmarkovics @mosabua gentle reminder for this PR. All comments are addressed and CI is successful.

Copy link
Copy Markdown
Contributor

@mxmarkovics mxmarkovics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed.

Comment thread docs/src/main/sphinx/connector/hudi.md Outdated
Comment thread plugin/trino-hive/src/main/java/io/trino/plugin/hive/avro/AvroHiveFileUtils.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSource.java Outdated
Comment thread plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSessionProperties.java Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove the usage of isIgnoreAbsentPartitions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filesystem view API of Hudi internally handles non-matching partitions. In other words the directory lister will return empty list (from listStatus call) when called for absent partitions. So, we don't need this config separately. I have removed it from hudi.md as well now.

Comment thread plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiSmokeTest.java Outdated
@codope codope force-pushed the hudi-mor-snapshot branch 3 times, most recently from a771840 to 8d007ff Compare May 9, 2025 14:28
@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 9, 2025

@ebyhr All comments addressed. Could you help me with fixing maven-checks 25-ea failure? I don't think it is related to the changes. Logs just show:

The JAVA_HOME environment variable is not defined correctly,
this environment variable is needed to run this program.
Error: Process completed with exit code 1.

@codope codope force-pushed the hudi-mor-snapshot branch from 8d007ff to fe98198 Compare May 10, 2025 01:39
@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 10, 2025

@ebyhr All comments addressed. Could you help me with fixing maven-checks 25-ea failure? I don't think it is related to the changes. Logs just show:

The JAVA_HOME environment variable is not defined correctly,
this environment variable is needed to run this program.
Error: Process completed with exit code 1.

Never mind, I saw that this CI check has been disabled recently - #25759. I've rebased.

@codope codope force-pushed the hudi-mor-snapshot branch from fe98198 to 278ad2d Compare May 12, 2025 12:15
@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 12, 2025

@ebyhr gentle ping for review.

@codope codope force-pushed the hudi-mor-snapshot branch from 278ad2d to 88f1a4c Compare May 14, 2025 07:13
@codope codope force-pushed the hudi-mor-snapshot branch 3 times, most recently from 042a4dd to 20a13ac Compare May 21, 2025 23:24
@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 22, 2025

@ebyhr @yihua @mosabua @wendigo Please help merge this PR.

@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented May 22, 2025

@codope I'd need to go through it in full extent. @ebyhr did you review it already?

@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 22, 2025

Would appreciate if this can be reviewed and merged sooner. I have another PR for adding data skipping support stacked on these changes - #25601

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented May 23, 2025

I don't think we can merge this PR soon as the change isn't so simple. Please be patient.

@codope
Copy link
Copy Markdown
Contributor Author

codope commented May 27, 2025

I don't think we can merge this PR soon as the change isn't so simple. Please be patient.

I understand. Let me know how I can help to accelerate the review. This PR is very crucial for the Hudi community as it restores metadata support and adds MOR snapshot query.

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented May 29, 2025

Is it possible to separate into 2 PRs? Why do we handle both MoR and metadata-tables in a commit?

@codope
Copy link
Copy Markdown
Contributor Author

codope commented Jun 9, 2025

Is it possible to separate into 2 PRs? Why do we handle both MoR and metadata-tables in a commit?

@ebyhr Hudi's metadata table is MoR table so we need to handle both.

Read log files using filegroup reader and fix tests

Add test table and a test with metadata enabled

Fix SqlDate type not able to cast to Number error
@codope codope force-pushed the hudi-mor-snapshot branch from 20a13ac to 4f2a8f8 Compare June 11, 2025 03:10
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 2, 2025

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Jul 2, 2025
@liujinhui1994
Copy link
Copy Markdown

We need this PR. When can it be merged into the trino community? @codope @ebyhr
We switched from presto to trino, and we are all based on hudi's lake warehouse table.

@github-actions github-actions Bot removed the stale label Jul 4, 2025
@voonhous
Copy link
Copy Markdown

Hello all, i can try picking this up. @ebyhr what are the pending items remaining here?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Aug 1, 2025

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions
Copy link
Copy Markdown

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions Bot closed this Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed docs hive Hive connector hudi Hudi connector stale

Development

Successfully merging this pull request may close these issues.

6 participants