build(docker): split Geant4 source build and physics data installation#253
build(docker): split Geant4 source build and physics data installation#253
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5259a61829
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Dockerfile
Outdated
| ARG OPTIX_VERSION=9.0.0 | ||
| ARG GEANT4_VERSION=11.3.2 | ||
| ARG CMAKE_VERSION=4.2.1 | ||
| ARG GEANT4_INSTALL_DATA=OFF |
There was a problem hiding this comment.
Restore Geant4 data installation by default
Changing GEANT4_INSTALL_DATA to default OFF makes default Docker builds ship without any Geant4 physics datasets: the new download block is skipped when this arg is unset, and the Geant4 build itself is also configured with -DGEANT4_INSTALL_DATA=OFF. I checked .github/workflows/release.yaml (the build-args section only passes OS, CUDA_VERSION, OPTIX_VERSION, GEANT4_VERSION, and CMAKE_VERSION) and the README Docker build commands, and neither sets GEANT4_INSTALL_DATA=ON, so published and local default images lose required G4*DATA content and can fail at runtime during Geant4 physics initialization.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Updates the container build to decouple building/installing Geant4 from downloading/installing Geant4 physics datasets, enabling optional dataset installation.
Changes:
- Add build arguments to control Geant4 dataset installation and specify dataset sources.
- Build/install Geant4 with
GEANT4_INSTALL_DATA=OFFand a fixed install data directory. - Add a conditional step to download and extract Geant4 datasets when enabled.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RUN if [ "${GEANT4_INSTALL_DATA}" = "ON" ]; then \ | ||
| mkdir -p /usr/local/share/Geant4/data; \ | ||
| for dataset in ${GEANT4_DATASETS}; do \ | ||
| curl -fsSL "${GEANT4_DATA_URL}/${dataset}" | tar -xz -C /usr/local/share/Geant4/data; \ |
There was a problem hiding this comment.
The dataset download/extract loop uses curl | tar without any integrity verification (hash/signature) for the tarballs. If a download is corrupted or the endpoint is compromised, this can silently introduce a supply-chain risk during image builds. Prefer downloading to a file and verifying an expected SHA256 (Geant4 publishes checksums), then extracting with safer tar flags (e.g., avoid preserving ownership/permissions) before deleting the archive.
| curl -fsSL "${GEANT4_DATA_URL}/${dataset}" | tar -xz -C /usr/local/share/Geant4/data; \ | |
| dataset_url="${GEANT4_DATA_URL}/${dataset}"; \ | |
| tmp_archive="/tmp/${dataset}"; \ | |
| curl -fsSL "${dataset_url}" -o "${tmp_archive}"; \ | |
| checksum_var_name="GEANT4_SHA256_$(echo "${dataset}" | tr '.-' '_' | tr '[:lower:]' '[:upper:]')"; \ | |
| expected_checksum=$(eval "printf '%s' \"\${${checksum_var_name}:-}\""); \ | |
| if [ -n "${expected_checksum}" ]; then \ | |
| echo "${expected_checksum} ${tmp_archive}" | sha256sum -c -; \ | |
| fi; \ | |
| tar -xzf "${tmp_archive}" --no-same-owner --no-same-permissions -C /usr/local/share/Geant4/data; \ | |
| rm -f "${tmp_archive}"; \ |
| ARG GEANT4_DATASETS="\ | ||
| G4NDL.4.7.1.tar.gz \ | ||
| G4EMLOW.8.6.1.tar.gz \ | ||
| G4PhotonEvaporation.6.1.tar.gz \ | ||
| G4RadioactiveDecay.6.1.2.tar.gz \ | ||
| G4PARTICLEXS.4.1.tar.gz \ | ||
| G4PII.1.3.tar.gz \ | ||
| G4RealSurface.2.2.tar.gz \ | ||
| G4SAIDDATA.2.0.tar.gz \ | ||
| G4ABLA.3.3.tar.gz \ | ||
| G4INCL.1.2.tar.gz \ | ||
| G4ENSDFSTATE.3.0.tar.gz \ | ||
| G4CHANNELING.1.0.tar.gz \ | ||
| G4TENDL.1.4.tar.gz \ | ||
| G4NUDEXLIB.1.0.tar.gz \ | ||
| G4URRPT.1.1.tar.gz" |
There was a problem hiding this comment.
GEANT4_DATASETS is pinned to specific dataset versions while GEANT4_VERSION remains configurable. If someone overrides GEANT4_VERSION at build time, this list can become inconsistent with the chosen Geant4 release. Consider either tying the dataset list to GEANT4_VERSION (version-specific defaults) or documenting that overriding GEANT4_VERSION requires also overriding GEANT4_DATASETS.
Dockerfile
Outdated
| ARG GEANT4_INSTALL_DATA=OFF | ||
| ARG GEANT4_DATA_URL=https://geant4-data.web.cern.ch/datasets |
There was a problem hiding this comment.
Defaulting GEANT4_INSTALL_DATA to OFF changes the default image behavior vs the previous Dockerfile (which always installed Geant4 datasets). This repo builds/runs Geant4 apps (e.g., src/simg4ox.cpp initializes FTFP_BERT), which generally require the physics datasets at runtime; building the image with defaults may yield a container that fails when running Geant4 due to missing data. Consider keeping the default as ON, or add a clear fail-fast check / runtime documentation so docker build ... without build-args still produces a usable image.
Dockerfile
Outdated
| && cmake -S /opt/geant4/src -B /opt/geant4/build -DGEANT4_USE_OPENGL_X11=ON -DGEANT4_USE_QT=ON -DGEANT4_USE_GDML=ON -DGEANT4_INSTALL_DATA=OFF -DGEANT4_INSTALL_DATADIR=share/Geant4/data -DGEANT4_BUILD_MULTITHREADED=ON \ | ||
| && cmake --build /opt/geant4/build --parallel --target install \ |
There was a problem hiding this comment.
GEANT4_INSTALL_DATA is now used as a Docker build-arg controlling the custom dataset download step, but the Geant4 CMake configure line hard-codes -DGEANT4_INSTALL_DATA=OFF. Reusing the same name for two different concepts (Geant4's CMake option vs this Docker-level toggle) is confusing and easy to misconfigure. Rename the Docker arg (e.g., GEANT4_DOWNLOAD_DATA / GEANT4_INSTALL_PHYSICS_DATA) or plumb the arg into the CMake flag so the naming stays consistent.
70fe350 to
5995998
Compare
No description provided.