fix: Update hadoop and hive download url#64
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR updates Hadoop and Hive download URLs in the Hive-related Dockerfiles to use stable Apache archive locations and bumps the overall project snapshot version in the Makefile. Flow diagram for updated Hadoop and Hive installation in hive3.1-hive Dockerfileflowchart TB
A[Start hive3.1-hive Docker build] --> B[Set HADOOP_VERSION and HIVE_VERSION build args]
B --> C[Set HADOOP_BINARY_PATH to archive.apache.org URL]
C --> D[curl download Hadoop tar.gz to /tmp/hadoop.tar.gz]
D --> E[Extract Hadoop into /opt]
E --> F[Move /opt/hadoop-HADOOP_VERSION to /opt/hadoop]
F --> G[Set HIVE_BINARY_PATH to archive.apache.org URL]
G --> H[curl download Hive tar.gz to /tmp/hive.tar.gz]
H --> I[Extract Hive into /opt]
I --> J[Move /opt/apache-hive-HIVE_VERSION-bin to /opt/hive]
J --> K[Continue Docker build]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- By inlining
HADOOP_BINARY_PATHandHIVE_BINARY_PATHinside theRUNcommands, you’ve removed the ability to override these via build arguments at build time; consider keeping them asARGdefaults so callers can still customize the download URLs when needed. - The Makefile
VERSIONbump to12-SNAPSHOTis unrelated to the Hadoop/Hive URL fix; consider splitting this into a separate PR to keep versioning changes logically isolated from functional changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- By inlining `HADOOP_BINARY_PATH` and `HIVE_BINARY_PATH` inside the `RUN` commands, you’ve removed the ability to override these via build arguments at build time; consider keeping them as `ARG` defaults so callers can still customize the download URLs when needed.
- The Makefile `VERSION` bump to `12-SNAPSHOT` is unrelated to the Hadoop/Hive URL fix; consider splitting this into a separate PR to keep versioning changes logically isolated from functional changes.
## Individual Comments
### Comment 1
<location path="prestodb/hive3.1-hive/Dockerfile" line_range="29-30" />
<code_context>
-ARG HIVE_BINARY_PATH=https://apache.mivzakim.net/hive/hive-$HIVE_VERSION/apache-hive-$HIVE_VERSION-bin.tar.gz
-
-RUN curl -o /tmp/hadoop.tar.gz --url $HADOOP_BINARY_PATH && \
+RUN HADOOP_BINARY_PATH=https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz && \
+ curl -o /tmp/hadoop.tar.gz --url $HADOOP_BINARY_PATH && \
tar xzf /tmp/hadoop.tar.gz --directory /opt && mv /opt/hadoop-$HADOOP_VERSION /opt/hadoop
</code_context>
<issue_to_address>
**issue:** Switching from build ARGs to local RUN variables removes configurability and weakens Docker caching.
Defining `HADOOP_BINARY_PATH` only inside `RUN` removes the ability to override it at build time and stops it from contributing to the image’s cache key. To keep configurability and caching while changing the default, consider retaining `ARG HADOOP_BINARY_PATH=...` and just updating its default value instead of redefining it in the `RUN` step.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| RUN HADOOP_BINARY_PATH=https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz && \ | ||
| curl -o /tmp/hadoop.tar.gz --url $HADOOP_BINARY_PATH && \ |
There was a problem hiding this comment.
issue: Switching from build ARGs to local RUN variables removes configurability and weakens Docker caching.
Defining HADOOP_BINARY_PATH only inside RUN removes the ability to override it at build time and stops it from contributing to the image’s cache key. To keep configurability and caching while changing the default, consider retaining ARG HADOOP_BINARY_PATH=... and just updating its default value instead of redefining it in the RUN step.
No description provided.