Skip to content

Refresh water module data sources #513

Open
Wegatriespython wants to merge 18 commits into
iiasa:mainfrom
Wegatriespython:wr/pr-a-data-refresh
Open

Refresh water module data sources #513
Wegatriespython wants to merge 18 commits into
iiasa:mainfrom
Wegatriespython:wr/pr-a-data-refresh

Conversation

@Wegatriespython

@Wegatriespython Wegatriespython commented May 20, 2026

Copy link
Copy Markdown
Contributor

This PR refreshes the input data sources for the water module, drops R11 support, and legacy impact representation for cooling module in lieu of PR #479

  • Water Availability :
    • New set of ISIMIP3b sources for qtot, qr, and eflow from CWatM across GCMS.
    • Percentile reliability recalculated,
  • Water demands :
    • Withdrawal data for SSPs from Khan 2022 source, for return uses SSP2 return ratio from previous data (added as ratio file) applied uniformly across SSPS.
  • Water access rates:
    • Uses 2 sources, projections_people_UR_income_10_25.csv: with urban/rural split, used
      for AFR, EEU, FSU, LAM, MEA, PAS, RCPA, SAS, WEU.
    • projections_people_merge_countries_10_25.csv: no split, used for
      NAM and CHN with the same rate assigned to both settings.
    • PAO had anomalous values with the new data, hence its is hardcoded with 0.99 rate.
  • Desalination :
    • Only SSP1, SSP3, SSP5 available, use mapping SSP1->SSP2, SSP3->SSP4
    • Downscales country level data to basin using a template derived from previous projected and historical desalination files.
    • Adds a utility file basin_allocation.py

Additionally, old files in pre-processing have been removed.

How to review

Go through generator files, to see methodology and assumptions for data imputation.

  • Run nexus module pre and after update. (SSP2)
  • Run nexus module for other SSPs

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both ✅
  • Add, expand, or update documentation.
  • Update doc/whatsnew.

@khaeru khaeru added the water MESSAGEix-Nexus (water) variant label May 22, 2026
@Wegatriespython Wegatriespython added the safe to test PRs from forks that do not pose security risks label May 28, 2026
@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.1%. Comparing base (b6511ef) to head (bad95e3).

Additional details and impacted files
@@           Coverage Diff           @@
##            main    #513     +/-   ##
=======================================
- Coverage   74.2%   74.1%   -0.1%     
=======================================
  Files        320     322      +2     
  Lines      25655   25707     +52     
=======================================
+ Hits       19047   19069     +22     
- Misses      6608    6638     +30     
Files with missing lines Coverage Δ
message_ix_models/model/water/cli.py 33.0% <ø> (ø)
message_ix_models/model/water/config.py 100.0% <ø> (ø)
message_ix_models/model/water/data/demands.py 69.5% <ø> (-1.1%) ⬇️
...ssage_ix_models/model/water/data/infrastructure.py 100.0% <ø> (ø)
...odel/water/data/pre_processing/basin_allocation.py 88.4% <ø> (ø)
...essage_ix_models/model/water/data/water_for_ppl.py 91.2% <ø> (-0.4%) ⬇️
message_ix_models/model/water/report.py 15.9% <ø> (-0.9%) ⬇️
message_ix_models/model/water/utils.py 78.2% <ø> (+1.1%) ⬆️
...ls/tests/model/water/data/test_basin_allocation.py 100.0% <ø> (ø)
...e_ix_models/tests/model/water/data/test_demands.py 100.0% <ø> (ø)
... and 4 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Remove the urban/rural connected-disconnected split from the optimizer.
Demand routes the full municipal volume through urban_mw/rural_mw;
connected/unconnected attribution is reconstructed at reporting time
as ACT_t_d * connection_rate.

Validated against the prior split formulation: OBJ delta <0.005%,
electricity price delta <0.01% across SSP2/SSP3.
The CLI passes sdgs as the string "baseline" or "SDG"; the previous
`if sdgs` truthiness check sent every baseline call down the SDG path
and returned an empty rates DataFrame. Check explicitly for True or
"SDG".
Switch scaind from 1 (auto-scale) to -1 (no scaling) for both solve
paths. Auto-scaling produced `LP status (5): optimal with unscaled
infeasibilities` on production baselines.
Shared helpers for distributing country- or region-level water inputs
across MESSAGE basins. Loads country-basin overlap, derives stable
basin shares from existing basin-level time series, and joins totals
to shares for downstream use by the per-domain generators that follow.

Refs #535.
Generate the R12 x SSP x (urban, rural) connection-rate baseline CSVs
from "Improved water services" rows in two files under
data/water/demands/drinking_water_access/:
projections_people_UR_income_10_25.csv (with urban/rural split, for
AFR, EEU, FSU, LAM, MEA, PAS, RCPA, SAS, WEU) and
projections_people_merge_countries_10_25(in).csv (no split, for NAM
and CHN, broadcast to both settings). PAO is hard-coded to 0.99.

R12 rates are population-weighted, carried backward to fill early
target years and capped forward at 2090 for 2100/2110, then broadcast
uniformly to basin columns. The basin column order comes from
connection_rate_basins_R12.csv.

SSP2 connection-rate CSVs are regenerated from the same pipeline.
SSP1, SSP3, SSP4, SSP5 connection-rate CSVs are net-new.

Refs #535.
Regenerate R12 x SSP x {urban, rural, manufacturing} x {withdrawal,
return} demand CSVs from Khan 2022 basin-level withdrawal projections
(/mnt/p/ene.model/NEST/water_demands/Khan2022,
doi:10.1038/s41597-023-02086-2). Withdrawals come directly from the
source; returns are withdrawal x per-basin return/withdrawal ratio,
read from new input files return_ratio_{urban_domestic, rural,
manufacturing}.csv. urban_withdrawal / urban_return combine domestic
and manufacturing.

SSP1, SSP3, SSP4, SSP5 demand CSVs are net-new.

Generator at pre_processing/generate_sectoral_demands.py.

Rename urban_withdrawal2 / urban_return2 -> urban_*_domestic across
R11, R12, ZMB, and old_R11 harmonized demand CSVs.

Refs #535.
Desalination capacity is socio-economic, not climate-conditioned.
The projected potential CSVs (R11, R12, ZMB) now carry an `ssp`
column derived from Marina's country-level source data, and
`add_desalination` filters on `context.ssp` instead of `cfg.RCP`.
SSP1, SSP3, SSP5 are taken directly from the source; SSP2 inherits
SSP1 and SSP4 inherits SSP3 by assignment.

R12 historical capacity is refreshed from the same source. R12 ships
a basin-allocation template
(desalination_basin_allocation_template_R12.csv) used by the new
generator at pre_processing/generate_desalination.py.

`test_infrastructure.py` parametrizations gain an explicit `ssp=SSP2`
so the SSP-keyed filter resolves.

Refs #535.
R12 hydro availability CSVs (qtot, qr, e-flow, 5-yr and monthly
variants) for 2p6, 7p0, and 8p5 are regenerated from the CWaTM
SSP-keyed percentile pipeline. Source data lives at
/mnt/p/watxene/ISIMIP_postprocessed/data_for_vignesh/message_nexus_input_2026/
(5 GCMs x 3 SSPs of qr_monthly + qtot_daily futures). Generator added
at pre_processing/generate_hydro_availability.py.

Three basins (30|FSU, 51|FSU, 154|FSU) are 100% NaN across all 5 GCMs
in the refreshed source; `filter_basins_by_region` excludes them
unconditionally so the water build never sees zero-availability rows
for them.

`compute_basin_demand_ratio` switches its supply baseline from
qtot/qr_5y_no_climate_low to qtot/qr_5y_2p6_low so the reduced-basin
ranking remains stable across the run's RCP choice. The same function
reads the renamed urban_*_domestic demand file introduced earlier in
this PR.

Refs #535.
The R12 input data shipped earlier in this PR is now produced by the
Python generators under pre_processing/ (generate_access_rates,
generate_sectoral_demands, generate_desalination,
generate_hydro_availability) on top of basin_allocation. The legacy
scripts (desalination.R, generate_water_constraints.R, hydro_agg_basin.py,
hydro_agg_raster.py, hydro_agg_spatial.R) have no remaining role and
are removed.

doc/water/index.rst is updated to drop the entries for the retired
scripts, add entries for basin_allocation and the four new generators,
and remove the stale "Deprecated R Code" section pointing at a removed
directory.

Refs #535.
CWaTM source data (refreshed earlier in this PR) covers 2p6, 7p0, and
8p5 only. The synthetic no_climate baseline has no source equivalent
and no defensible RCP analogue, so it is dropped from the CLI choices
and from the R12 availability data set.

`_RCPS` in cli.py is now ["2p6", "7p0", "8p5"]; the `--rcps` defaults
and help strings for `nexus` and `cooling` flip to 2p6. `Config.RCP`
default flips to "2p6". R12 no_climate availability CSVs are removed.

test_water_data.py drops no_climate from the R11-only legacy exclusion
list so the R11<->R12 parity check still passes. test_build.py picks
7p0 as the build-test RCP in place of the retired 6p0.

R11 and ZMB availability data are untouched; their legacy RCPs (6p0,
no_climate) remain available for R11 regression coverage.

Refs #535.
The cooling-impact dataset
(power_plant_cooling_impact_MESSAGE_*_{2p6,6p0,7p0}.csv) covered only
the legacy RCPs and was gated behind `cfg.RCP == "no_climate"`. With
no_climate now removed and 8p5 added (neither has impact coverage),
the gated branch in `_make_capacity_factor` is unreachable for any
defensible RCP choice. The branch is removed; the function now
returns the dimensionless capacity_factor unchanged. The eleven
cooling-impact CSVs are deleted.

`test_water_for_ppl.py` drops its no_climate parametrizations.

Refs #535.
add_sectoral_demands now resolves ssp = context.ssp.lower() and reads
withdrawal / return CSVs matching {ssp}_regional_*.csv, picking up the
SSP1, SSP3, SSP4, SSP5 demand data shipped earlier in this PR instead
of the SSP2 fallback. Treatment-rate and recycling-rate CSVs are read
from ssp2_regional_*_rate_baseline.csv regardless of the current SSP,
since rate data is not differentiated across SSPs. The monthly path
({ssp}_m_water_demands.csv) is keyed by current SSP the same way.

The in-memory variable strings (urban_withdrawal2_baseline,
urban_return2_baseline) are renamed to urban_*_domestic_baseline to
match the CSV-side rename that landed earlier; the prior strings no
longer matched any variable in the loaded data and were silently
returning empty slices.

Known gap: ssp{n}_m_water_demands.csv exists only for SSP2/ZMB. A
non-SSP2 run with sub-annual time slices will raise FileNotFoundError
on the monthly path. R12 has no monthly file even for SSP2, so the
sub-annual + R12 path was already broken; this commit makes the
failure mode explicit per-SSP rather than introducing it.

Refs #535.

@adrivinca adrivinca left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few points:

  1. get_rates_data fallback misses treatment/recycling rates for SSP≠2 (report.py:425). When all_rates_{ssp}.csv doesn't exist (which is every baseline run, since demands.py only writes it inside the if cfg.SDG != "baseline" branch), the fallback globs {ssp}regionalrate.csv. For SSP1/3/4/5 that only finds the connection-rate files —
    the treatment/recycling rates exist only as ssp2_* files. The connected/unconnected reconstruction works, but the water-access/sanitation reporting (the other get_rates_data call at line 705) silently loses treatment rates for non-SSP2 runs. The fallback should mirror demands.py's "rates always come from ssp2" convention.

  2. Sub-annual demands now break for SSP≠2: we know. Let's maybe keep track of this as next step

  3. Stray no_climate file: e-flow_5y_m_no_climate_R12.csv was modified (+217/−217) even though no_climate was removed from _RCPS and the generator doesn't produce it — the qtot/qr no_climate files were deleted. Looks like a leftover;
    any reasons for it?

The get_rates_data fallback globbed only {ssp}_regional_*_rate_*.csv,
but treatment-rate and recycling-rate data ship only as ssp2 files.
Non-SSP2 baseline runs therefore dropped treatment and recycling rates
from water-access and sanitation reporting. Source connection rates from
the run's SSP and treatment/recycling rates from SSP2 for any SSP,
matching the convention in demands.py.
no_climate is no longer a supported RCP: the e-flow availability data is
read only as e-flow_5y_m_{RCP} and the generator emits e-flow only for the
configured RCPs, so this file is unreachable. It was the lone no_climate
availability file the R12 refresh left behind after removing the rest.
Treatment and recycling rates ship only as ssp2 files and are read from
them for every SSP; the variable name is the filename stem after
<ssp>_regional_. Extract both into water.utils (ssp2_rate_files,
variable_from_stem, SSP2_RATES) and use them in add_sectoral_demands and
get_rates_data.
@Wegatriespython

Wegatriespython commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@adrivinca Addressed two issues :

  1. The SSP2 treatment/recycling logic is refactored into helper so it is consistently used for demands.py & reporting.py and the glob is fixed to use SSP for connection and treatment/recycling from the helper.
  2. The stray 'eflow_noclimate' file has been deleted

Sub annual left for future PR

Bring the refreshed pre-processing generators into ruff-format
compliance
and wrap four over-length lines flagged by the pinned ruff (E501).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test PRs from forks that do not pose security risks water MESSAGEix-Nexus (water) variant

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants