Refresh water module data sources by Wegatriespython · Pull Request #513 · iiasa/message-ix-models

Wegatriespython · 2026-05-20T14:45:16Z

This PR refreshes the input data sources for the water module, drops R11 support, and legacy impact representation for cooling module in lieu of PR #479

Water Availability :
- New set of ISIMIP3b sources for qtot, qr, and eflow from CWatM across GCMS.
- Percentile reliability recalculated,
Water demands :
- Withdrawal data for SSPs from Khan 2022 source, for return uses SSP2 return ratio from previous data (added as ratio file) applied uniformly across SSPS.
Water access rates:
- Uses 2 sources, projections_people_UR_income_10_25.csv: with urban/rural split, used
  for AFR, EEU, FSU, LAM, MEA, PAS, RCPA, SAS, WEU.
- projections_people_merge_countries_10_25.csv: no split, used for
  NAM and CHN with the same rate assigned to both settings.
- PAO had anomalous values with the new data, hence its is hardcoded with 0.99 rate.
Desalination :
- Only SSP1, SSP3, SSP5 available, use mapping SSP1->SSP2, SSP3->SSP4
- Downscales country level data to basin using a template derived from previous projected and historical desalination files.
- Adds a utility file basin_allocation.py

Additionally, old files in pre-processing have been removed.

How to review

Go through generator files, to see methodology and assumptions for data imputation.

Run nexus module pre and after update. (SSP2)
Run nexus module for other SSPs

PR checklist

Continuous integration checks all ✅
Add or expand tests; coverage checks both ✅
Add, expand, or update documentation.
Update doc/whatsnew.

codecov · 2026-05-28T12:27:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.1%. Comparing base (b6511ef) to head (bad95e3).

Additional details and impacted files

@@           Coverage Diff           @@
##            main    #513     +/-   ##
=======================================
- Coverage   74.2%   74.1%   -0.1%     
=======================================
  Files        320     322      +2     
  Lines      25655   25707     +52     
=======================================
+ Hits       19047   19069     +22     
- Misses      6608    6638     +30

Files with missing lines	Coverage Δ
message_ix_models/model/water/cli.py	`33.0% <ø> (ø)`
message_ix_models/model/water/config.py	`100.0% <ø> (ø)`
message_ix_models/model/water/data/demands.py	`69.5% <ø> (-1.1%)`	⬇️
...ssage_ix_models/model/water/data/infrastructure.py	`100.0% <ø> (ø)`
...odel/water/data/pre_processing/basin_allocation.py	`88.4% <ø> (ø)`
...essage_ix_models/model/water/data/water_for_ppl.py	`91.2% <ø> (-0.4%)`	⬇️
message_ix_models/model/water/report.py	`15.9% <ø> (-0.9%)`	⬇️
message_ix_models/model/water/utils.py	`78.2% <ø> (+1.1%)`	⬆️
...ls/tests/model/water/data/test_basin_allocation.py	`100.0% <ø> (ø)`
...e_ix_models/tests/model/water/data/test_demands.py	`100.0% <ø> (ø)`
... and 4 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Remove the urban/rural connected-disconnected split from the optimizer. Demand routes the full municipal volume through urban_mw/rural_mw; connected/unconnected attribution is reconstructed at reporting time as ACT_t_d * connection_rate. Validated against the prior split formulation: OBJ delta <0.005%, electricity price delta <0.01% across SSP2/SSP3.

The CLI passes sdgs as the string "baseline" or "SDG"; the previous `if sdgs` truthiness check sent every baseline call down the SDG path and returned an empty rates DataFrame. Check explicitly for True or "SDG".

Switch scaind from 1 (auto-scale) to -1 (no scaling) for both solve paths. Auto-scaling produced `LP status (5): optimal with unscaled infeasibilities` on production baselines.

Shared helpers for distributing country- or region-level water inputs across MESSAGE basins. Loads country-basin overlap, derives stable basin shares from existing basin-level time series, and joins totals to shares for downstream use by the per-domain generators that follow. Refs #535.

Generate the R12 x SSP x (urban, rural) connection-rate baseline CSVs from "Improved water services" rows in two files under data/water/demands/drinking_water_access/: projections_people_UR_income_10_25.csv (with urban/rural split, for AFR, EEU, FSU, LAM, MEA, PAS, RCPA, SAS, WEU) and projections_people_merge_countries_10_25(in).csv (no split, for NAM and CHN, broadcast to both settings). PAO is hard-coded to 0.99. R12 rates are population-weighted, carried backward to fill early target years and capped forward at 2090 for 2100/2110, then broadcast uniformly to basin columns. The basin column order comes from connection_rate_basins_R12.csv. SSP2 connection-rate CSVs are regenerated from the same pipeline. SSP1, SSP3, SSP4, SSP5 connection-rate CSVs are net-new. Refs #535.

Regenerate R12 x SSP x {urban, rural, manufacturing} x {withdrawal, return} demand CSVs from Khan 2022 basin-level withdrawal projections (/mnt/p/ene.model/NEST/water_demands/Khan2022, doi:10.1038/s41597-023-02086-2). Withdrawals come directly from the source; returns are withdrawal x per-basin return/withdrawal ratio, read from new input files return_ratio_{urban_domestic, rural, manufacturing}.csv. urban_withdrawal / urban_return combine domestic and manufacturing. SSP1, SSP3, SSP4, SSP5 demand CSVs are net-new. Generator at pre_processing/generate_sectoral_demands.py. Rename urban_withdrawal2 / urban_return2 -> urban_*_domestic across R11, R12, ZMB, and old_R11 harmonized demand CSVs. Refs #535.

Desalination capacity is socio-economic, not climate-conditioned. The projected potential CSVs (R11, R12, ZMB) now carry an `ssp` column derived from Marina's country-level source data, and `add_desalination` filters on `context.ssp` instead of `cfg.RCP`. SSP1, SSP3, SSP5 are taken directly from the source; SSP2 inherits SSP1 and SSP4 inherits SSP3 by assignment. R12 historical capacity is refreshed from the same source. R12 ships a basin-allocation template (desalination_basin_allocation_template_R12.csv) used by the new generator at pre_processing/generate_desalination.py. `test_infrastructure.py` parametrizations gain an explicit `ssp=SSP2` so the SSP-keyed filter resolves. Refs #535.

R12 hydro availability CSVs (qtot, qr, e-flow, 5-yr and monthly variants) for 2p6, 7p0, and 8p5 are regenerated from the CWaTM SSP-keyed percentile pipeline. Source data lives at /mnt/p/watxene/ISIMIP_postprocessed/data_for_vignesh/message_nexus_input_2026/ (5 GCMs x 3 SSPs of qr_monthly + qtot_daily futures). Generator added at pre_processing/generate_hydro_availability.py. Three basins (30|FSU, 51|FSU, 154|FSU) are 100% NaN across all 5 GCMs in the refreshed source; `filter_basins_by_region` excludes them unconditionally so the water build never sees zero-availability rows for them. `compute_basin_demand_ratio` switches its supply baseline from qtot/qr_5y_no_climate_low to qtot/qr_5y_2p6_low so the reduced-basin ranking remains stable across the run's RCP choice. The same function reads the renamed urban_*_domestic demand file introduced earlier in this PR. Refs #535.

The R12 input data shipped earlier in this PR is now produced by the Python generators under pre_processing/ (generate_access_rates, generate_sectoral_demands, generate_desalination, generate_hydro_availability) on top of basin_allocation. The legacy scripts (desalination.R, generate_water_constraints.R, hydro_agg_basin.py, hydro_agg_raster.py, hydro_agg_spatial.R) have no remaining role and are removed. doc/water/index.rst is updated to drop the entries for the retired scripts, add entries for basin_allocation and the four new generators, and remove the stale "Deprecated R Code" section pointing at a removed directory. Refs #535.

CWaTM source data (refreshed earlier in this PR) covers 2p6, 7p0, and 8p5 only. The synthetic no_climate baseline has no source equivalent and no defensible RCP analogue, so it is dropped from the CLI choices and from the R12 availability data set. `_RCPS` in cli.py is now ["2p6", "7p0", "8p5"]; the `--rcps` defaults and help strings for `nexus` and `cooling` flip to 2p6. `Config.RCP` default flips to "2p6". R12 no_climate availability CSVs are removed. test_water_data.py drops no_climate from the R11-only legacy exclusion list so the R11<->R12 parity check still passes. test_build.py picks 7p0 as the build-test RCP in place of the retired 6p0. R11 and ZMB availability data are untouched; their legacy RCPs (6p0, no_climate) remain available for R11 regression coverage. Refs #535.

The cooling-impact dataset (power_plant_cooling_impact_MESSAGE_*_{2p6,6p0,7p0}.csv) covered only the legacy RCPs and was gated behind `cfg.RCP == "no_climate"`. With no_climate now removed and 8p5 added (neither has impact coverage), the gated branch in `_make_capacity_factor` is unreachable for any defensible RCP choice. The branch is removed; the function now returns the dimensionless capacity_factor unchanged. The eleven cooling-impact CSVs are deleted. `test_water_for_ppl.py` drops its no_climate parametrizations. Refs #535.

add_sectoral_demands now resolves ssp = context.ssp.lower() and reads withdrawal / return CSVs matching {ssp}_regional_*.csv, picking up the SSP1, SSP3, SSP4, SSP5 demand data shipped earlier in this PR instead of the SSP2 fallback. Treatment-rate and recycling-rate CSVs are read from ssp2_regional_*_rate_baseline.csv regardless of the current SSP, since rate data is not differentiated across SSPs. The monthly path ({ssp}_m_water_demands.csv) is keyed by current SSP the same way. The in-memory variable strings (urban_withdrawal2_baseline, urban_return2_baseline) are renamed to urban_*_domestic_baseline to match the CSV-side rename that landed earlier; the prior strings no longer matched any variable in the loaded data and were silently returning empty slices. Known gap: ssp{n}_m_water_demands.csv exists only for SSP2/ZMB. A non-SSP2 run with sub-annual time slices will raise FileNotFoundError on the monthly path. R12 has no monthly file even for SSP2, so the sub-annual + R12 path was already broken; this commit makes the failure mode explicit per-SSP rather than introducing it. Refs #535.

adrivinca

just a few points:

get_rates_data fallback misses treatment/recycling rates for SSP≠2 (report.py:425). When all_rates_{ssp}.csv doesn't exist (which is every baseline run, since demands.py only writes it inside the if cfg.SDG != "baseline" branch), the fallback globs {ssp}regionalrate.csv. For SSP1/3/4/5 that only finds the connection-rate files —
the treatment/recycling rates exist only as ssp2_* files. The connected/unconnected reconstruction works, but the water-access/sanitation reporting (the other get_rates_data call at line 705) silently loses treatment rates for non-SSP2 runs. The fallback should mirror demands.py's "rates always come from ssp2" convention.
Sub-annual demands now break for SSP≠2: we know. Let's maybe keep track of this as next step
Stray no_climate file: e-flow_5y_m_no_climate_R12.csv was modified (+217/−217) even though no_climate was removed from _RCPS and the generator doesn't produce it — the qtot/qr no_climate files were deleted. Looks like a leftover;
any reasons for it?

The get_rates_data fallback globbed only {ssp}_regional_*_rate_*.csv, but treatment-rate and recycling-rate data ship only as ssp2 files. Non-SSP2 baseline runs therefore dropped treatment and recycling rates from water-access and sanitation reporting. Source connection rates from the run's SSP and treatment/recycling rates from SSP2 for any SSP, matching the convention in demands.py.

no_climate is no longer a supported RCP: the e-flow availability data is read only as e-flow_5y_m_{RCP} and the generator emits e-flow only for the configured RCPs, so this file is unreachable. It was the lone no_climate availability file the R12 refresh left behind after removing the rest.

Treatment and recycling rates ship only as ssp2 files and are read from them for every SSP; the variable name is the filename stem after <ssp>_regional_. Extract both into water.utils (ssp2_rate_files, variable_from_stem, SSP2_RATES) and use them in add_sectoral_demands and get_rates_data.

Wegatriespython · 2026-06-22T13:04:07Z

@adrivinca Addressed two issues :

The SSP2 treatment/recycling logic is refactored into helper so it is consistently used for demands.py & reporting.py and the glob is fixed to use SSP for connection and treatment/recycling from the helper.
The stray 'eflow_noclimate' file has been deleted

Sub annual left for future PR

Bring the refreshed pre-processing generators into ruff-format compliance and wrap four over-length lines flagged by the pinned ruff (E501).

Wegatriespython requested review from adrivinca and awais307 as code owners May 20, 2026 14:45

Wegatriespython temporarily deployed to publish May 20, 2026 14:45 — with GitHub Actions Inactive

khaeru added the water MESSAGEix-Nexus (water) variant label May 22, 2026

Wegatriespython added the safe to test PRs from forks that do not pose security risks label May 28, 2026

Wegatriespython temporarily deployed to publish June 9, 2026 18:48 — with GitHub Actions Inactive

Wegatriespython mentioned this pull request Jun 9, 2026

Water/Nexus : Desalination & Historical Activity Fixes #522

Open

5 tasks

Wegatriespython added 14 commits June 10, 2026 15:15

Handle CLI string sdgs in water report get_rates_data

f8d477c

The CLI passes sdgs as the string "baseline" or "SDG"; the previous `if sdgs` truthiness check sent every baseline call down the SDG path and returned an empty rates DataFrame. Check explicitly for True or "SDG".

Disable CPLEX scaling in water-ix nexus and cooling solves

00c7775

Switch scaind from 1 (auto-scale) to -1 (no scaling) for both solve paths. Auto-scaling produced `LP status (5): optimal with unscaled infeasibilities` on production baselines.

Add PR 510 whatsnew entry

b41cfe3

Add PR 513 whatsnew entry

e2c35c0

Wegatriespython force-pushed the wr/pr-a-data-refresh branch from 409a4ae to e2c35c0 Compare June 10, 2026 13:17

Wegatriespython temporarily deployed to publish June 10, 2026 13:17 — with GitHub Actions Inactive

adrivinca reviewed Jun 11, 2026

View reviewed changes

Wegatriespython added 3 commits June 22, 2026 14:54

Wegatriespython temporarily deployed to publish June 22, 2026 12:58 — with GitHub Actions Inactive

Satisfy ruff

bad95e3

Bring the refreshed pre-processing generators into ruff-format compliance and wrap four over-length lines flagged by the pinned ruff (E501).

Wegatriespython temporarily deployed to publish June 22, 2026 13:31 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refresh water module data sources #513

Refresh water module data sources #513
Wegatriespython wants to merge 18 commits into
iiasa:mainfrom
Wegatriespython:wr/pr-a-data-refresh

Wegatriespython commented May 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

adrivinca left a comment

Uh oh!

Wegatriespython commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Wegatriespython commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to review

PR checklist

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

adrivinca left a comment

Choose a reason for hiding this comment

Uh oh!

Wegatriespython commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wegatriespython commented May 20, 2026 •

edited

Loading

codecov Bot commented May 28, 2026 •

edited

Loading

Wegatriespython commented Jun 22, 2026 •

edited

Loading