Skip to content

fix: Update hadoop and hive download url#64

Open
unidevel wants to merge 1 commit into
prestodb:masterfrom
unix280:fix-url
Open

fix: Update hadoop and hive download url#64
unidevel wants to merge 1 commit into
prestodb:masterfrom
unix280:fix-url

Conversation

@unidevel
Copy link
Copy Markdown
Contributor

No description provided.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 24, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR updates Hadoop and Hive download URLs in the Hive-related Dockerfiles to use stable Apache archive locations and bumps the overall project snapshot version in the Makefile.

Flow diagram for updated Hadoop and Hive installation in hive3.1-hive Dockerfile

flowchart TB
    A[Start hive3.1-hive Docker build] --> B[Set HADOOP_VERSION and HIVE_VERSION build args]
    B --> C[Set HADOOP_BINARY_PATH to archive.apache.org URL]
    C --> D[curl download Hadoop tar.gz to /tmp/hadoop.tar.gz]
    D --> E[Extract Hadoop into /opt]
    E --> F[Move /opt/hadoop-HADOOP_VERSION to /opt/hadoop]
    F --> G[Set HIVE_BINARY_PATH to archive.apache.org URL]
    G --> H[curl download Hive tar.gz to /tmp/hive.tar.gz]
    H --> I[Extract Hive into /opt]
    I --> J[Move /opt/apache-hive-HIVE_VERSION-bin to /opt/hive]
    J --> K[Continue Docker build]
Loading

File-Level Changes

Change Details Files
Update Hadoop and Hive download URL handling in the Hive 3.1 Docker image build.
  • Remove build ARGs for Hadoop and Hive binary URLs and instead define them inline within their respective RUN commands.
  • Set Hadoop download URL to the Apache archive location using the HADOOP_VERSION build argument.
  • Set Hive download URL to the Apache archive location using the HIVE_VERSION build argument.
  • Keep the existing extraction and installation steps for Hadoop and Hive unchanged after download.
prestodb/hive3.1-hive/Dockerfile
Bump project snapshot version number.
  • Change VERSION variable from 1-SNAPSHOT to 12-SNAPSHOT to reflect a new snapshot series.
Makefile
Update Hive download URL in the Hive 4.0 Docker image build to use Apache archive.
  • Change HIVE_BINARY_PATH to point to archive.apache.org instead of dlcdn.apache.org for Hive binaries while preserving variable structure and downstream commands.
prestodb/hive4.0-hive/Dockerfile

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • By inlining HADOOP_BINARY_PATH and HIVE_BINARY_PATH inside the RUN commands, you’ve removed the ability to override these via build arguments at build time; consider keeping them as ARG defaults so callers can still customize the download URLs when needed.
  • The Makefile VERSION bump to 12-SNAPSHOT is unrelated to the Hadoop/Hive URL fix; consider splitting this into a separate PR to keep versioning changes logically isolated from functional changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- By inlining `HADOOP_BINARY_PATH` and `HIVE_BINARY_PATH` inside the `RUN` commands, you’ve removed the ability to override these via build arguments at build time; consider keeping them as `ARG` defaults so callers can still customize the download URLs when needed.
- The Makefile `VERSION` bump to `12-SNAPSHOT` is unrelated to the Hadoop/Hive URL fix; consider splitting this into a separate PR to keep versioning changes logically isolated from functional changes.

## Individual Comments

### Comment 1
<location path="prestodb/hive3.1-hive/Dockerfile" line_range="29-30" />
<code_context>
-ARG HIVE_BINARY_PATH=https://apache.mivzakim.net/hive/hive-$HIVE_VERSION/apache-hive-$HIVE_VERSION-bin.tar.gz
-
-RUN curl -o /tmp/hadoop.tar.gz --url $HADOOP_BINARY_PATH && \
+RUN HADOOP_BINARY_PATH=https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz && \
+    curl -o /tmp/hadoop.tar.gz --url $HADOOP_BINARY_PATH && \
     tar xzf /tmp/hadoop.tar.gz --directory /opt && mv /opt/hadoop-$HADOOP_VERSION /opt/hadoop

</code_context>
<issue_to_address>
**issue:** Switching from build ARGs to local RUN variables removes configurability and weakens Docker caching.

Defining `HADOOP_BINARY_PATH` only inside `RUN` removes the ability to override it at build time and stops it from contributing to the image’s cache key. To keep configurability and caching while changing the default, consider retaining `ARG HADOOP_BINARY_PATH=...` and just updating its default value instead of redefining it in the `RUN` step.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +29 to +30
RUN HADOOP_BINARY_PATH=https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz && \
curl -o /tmp/hadoop.tar.gz --url $HADOOP_BINARY_PATH && \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Switching from build ARGs to local RUN variables removes configurability and weakens Docker caching.

Defining HADOOP_BINARY_PATH only inside RUN removes the ability to override it at build time and stops it from contributing to the image’s cache key. To keep configurability and caching while changing the default, consider retaining ARG HADOOP_BINARY_PATH=... and just updating its default value instead of redefining it in the RUN step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants