Refresh water module data sources #513
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #513 +/- ##
=======================================
- Coverage 74.2% 74.1% -0.1%
=======================================
Files 320 322 +2
Lines 25655 25707 +52
=======================================
+ Hits 19047 19069 +22
- Misses 6608 6638 +30
🚀 New features to boost your workflow:
|
Remove the urban/rural connected-disconnected split from the optimizer. Demand routes the full municipal volume through urban_mw/rural_mw; connected/unconnected attribution is reconstructed at reporting time as ACT_t_d * connection_rate. Validated against the prior split formulation: OBJ delta <0.005%, electricity price delta <0.01% across SSP2/SSP3.
The CLI passes sdgs as the string "baseline" or "SDG"; the previous `if sdgs` truthiness check sent every baseline call down the SDG path and returned an empty rates DataFrame. Check explicitly for True or "SDG".
Switch scaind from 1 (auto-scale) to -1 (no scaling) for both solve paths. Auto-scaling produced `LP status (5): optimal with unscaled infeasibilities` on production baselines.
Shared helpers for distributing country- or region-level water inputs across MESSAGE basins. Loads country-basin overlap, derives stable basin shares from existing basin-level time series, and joins totals to shares for downstream use by the per-domain generators that follow. Refs #535.
Generate the R12 x SSP x (urban, rural) connection-rate baseline CSVs from "Improved water services" rows in two files under data/water/demands/drinking_water_access/: projections_people_UR_income_10_25.csv (with urban/rural split, for AFR, EEU, FSU, LAM, MEA, PAS, RCPA, SAS, WEU) and projections_people_merge_countries_10_25(in).csv (no split, for NAM and CHN, broadcast to both settings). PAO is hard-coded to 0.99. R12 rates are population-weighted, carried backward to fill early target years and capped forward at 2090 for 2100/2110, then broadcast uniformly to basin columns. The basin column order comes from connection_rate_basins_R12.csv. SSP2 connection-rate CSVs are regenerated from the same pipeline. SSP1, SSP3, SSP4, SSP5 connection-rate CSVs are net-new. Refs #535.
Regenerate R12 x SSP x {urban, rural, manufacturing} x {withdrawal,
return} demand CSVs from Khan 2022 basin-level withdrawal projections
(/mnt/p/ene.model/NEST/water_demands/Khan2022,
doi:10.1038/s41597-023-02086-2). Withdrawals come directly from the
source; returns are withdrawal x per-basin return/withdrawal ratio,
read from new input files return_ratio_{urban_domestic, rural,
manufacturing}.csv. urban_withdrawal / urban_return combine domestic
and manufacturing.
SSP1, SSP3, SSP4, SSP5 demand CSVs are net-new.
Generator at pre_processing/generate_sectoral_demands.py.
Rename urban_withdrawal2 / urban_return2 -> urban_*_domestic across
R11, R12, ZMB, and old_R11 harmonized demand CSVs.
Refs #535.
Desalination capacity is socio-economic, not climate-conditioned. The projected potential CSVs (R11, R12, ZMB) now carry an `ssp` column derived from Marina's country-level source data, and `add_desalination` filters on `context.ssp` instead of `cfg.RCP`. SSP1, SSP3, SSP5 are taken directly from the source; SSP2 inherits SSP1 and SSP4 inherits SSP3 by assignment. R12 historical capacity is refreshed from the same source. R12 ships a basin-allocation template (desalination_basin_allocation_template_R12.csv) used by the new generator at pre_processing/generate_desalination.py. `test_infrastructure.py` parametrizations gain an explicit `ssp=SSP2` so the SSP-keyed filter resolves. Refs #535.
R12 hydro availability CSVs (qtot, qr, e-flow, 5-yr and monthly variants) for 2p6, 7p0, and 8p5 are regenerated from the CWaTM SSP-keyed percentile pipeline. Source data lives at /mnt/p/watxene/ISIMIP_postprocessed/data_for_vignesh/message_nexus_input_2026/ (5 GCMs x 3 SSPs of qr_monthly + qtot_daily futures). Generator added at pre_processing/generate_hydro_availability.py. Three basins (30|FSU, 51|FSU, 154|FSU) are 100% NaN across all 5 GCMs in the refreshed source; `filter_basins_by_region` excludes them unconditionally so the water build never sees zero-availability rows for them. `compute_basin_demand_ratio` switches its supply baseline from qtot/qr_5y_no_climate_low to qtot/qr_5y_2p6_low so the reduced-basin ranking remains stable across the run's RCP choice. The same function reads the renamed urban_*_domestic demand file introduced earlier in this PR. Refs #535.
The R12 input data shipped earlier in this PR is now produced by the Python generators under pre_processing/ (generate_access_rates, generate_sectoral_demands, generate_desalination, generate_hydro_availability) on top of basin_allocation. The legacy scripts (desalination.R, generate_water_constraints.R, hydro_agg_basin.py, hydro_agg_raster.py, hydro_agg_spatial.R) have no remaining role and are removed. doc/water/index.rst is updated to drop the entries for the retired scripts, add entries for basin_allocation and the four new generators, and remove the stale "Deprecated R Code" section pointing at a removed directory. Refs #535.
CWaTM source data (refreshed earlier in this PR) covers 2p6, 7p0, and 8p5 only. The synthetic no_climate baseline has no source equivalent and no defensible RCP analogue, so it is dropped from the CLI choices and from the R12 availability data set. `_RCPS` in cli.py is now ["2p6", "7p0", "8p5"]; the `--rcps` defaults and help strings for `nexus` and `cooling` flip to 2p6. `Config.RCP` default flips to "2p6". R12 no_climate availability CSVs are removed. test_water_data.py drops no_climate from the R11-only legacy exclusion list so the R11<->R12 parity check still passes. test_build.py picks 7p0 as the build-test RCP in place of the retired 6p0. R11 and ZMB availability data are untouched; their legacy RCPs (6p0, no_climate) remain available for R11 regression coverage. Refs #535.
The cooling-impact dataset
(power_plant_cooling_impact_MESSAGE_*_{2p6,6p0,7p0}.csv) covered only
the legacy RCPs and was gated behind `cfg.RCP == "no_climate"`. With
no_climate now removed and 8p5 added (neither has impact coverage),
the gated branch in `_make_capacity_factor` is unreachable for any
defensible RCP choice. The branch is removed; the function now
returns the dimensionless capacity_factor unchanged. The eleven
cooling-impact CSVs are deleted.
`test_water_for_ppl.py` drops its no_climate parametrizations.
Refs #535.
add_sectoral_demands now resolves ssp = context.ssp.lower() and reads
withdrawal / return CSVs matching {ssp}_regional_*.csv, picking up the
SSP1, SSP3, SSP4, SSP5 demand data shipped earlier in this PR instead
of the SSP2 fallback. Treatment-rate and recycling-rate CSVs are read
from ssp2_regional_*_rate_baseline.csv regardless of the current SSP,
since rate data is not differentiated across SSPs. The monthly path
({ssp}_m_water_demands.csv) is keyed by current SSP the same way.
The in-memory variable strings (urban_withdrawal2_baseline,
urban_return2_baseline) are renamed to urban_*_domestic_baseline to
match the CSV-side rename that landed earlier; the prior strings no
longer matched any variable in the loaded data and were silently
returning empty slices.
Known gap: ssp{n}_m_water_demands.csv exists only for SSP2/ZMB. A
non-SSP2 run with sub-annual time slices will raise FileNotFoundError
on the monthly path. R12 has no monthly file even for SSP2, so the
sub-annual + R12 path was already broken; this commit makes the
failure mode explicit per-SSP rather than introducing it.
Refs #535.
409a4ae to
e2c35c0
Compare
adrivinca
left a comment
There was a problem hiding this comment.
just a few points:
-
get_rates_data fallback misses treatment/recycling rates for SSP≠2 (report.py:425). When all_rates_{ssp}.csv doesn't exist (which is every baseline run, since demands.py only writes it inside the if cfg.SDG != "baseline" branch), the fallback globs {ssp}regionalrate.csv. For SSP1/3/4/5 that only finds the connection-rate files —
the treatment/recycling rates exist only as ssp2_* files. The connected/unconnected reconstruction works, but the water-access/sanitation reporting (the other get_rates_data call at line 705) silently loses treatment rates for non-SSP2 runs. The fallback should mirror demands.py's "rates always come from ssp2" convention. -
Sub-annual demands now break for SSP≠2: we know. Let's maybe keep track of this as next step
-
Stray no_climate file: e-flow_5y_m_no_climate_R12.csv was modified (+217/−217) even though no_climate was removed from _RCPS and the generator doesn't produce it — the qtot/qr no_climate files were deleted. Looks like a leftover;
any reasons for it?
The get_rates_data fallback globbed only {ssp}_regional_*_rate_*.csv,
but treatment-rate and recycling-rate data ship only as ssp2 files.
Non-SSP2 baseline runs therefore dropped treatment and recycling rates
from water-access and sanitation reporting. Source connection rates from
the run's SSP and treatment/recycling rates from SSP2 for any SSP,
matching the convention in demands.py.
no_climate is no longer a supported RCP: the e-flow availability data is
read only as e-flow_5y_m_{RCP} and the generator emits e-flow only for the
configured RCPs, so this file is unreachable. It was the lone no_climate
availability file the R12 refresh left behind after removing the rest.
Treatment and recycling rates ship only as ssp2 files and are read from them for every SSP; the variable name is the filename stem after <ssp>_regional_. Extract both into water.utils (ssp2_rate_files, variable_from_stem, SSP2_RATES) and use them in add_sectoral_demands and get_rates_data.
|
@adrivinca Addressed two issues :
Sub annual left for future PR |
Bring the refreshed pre-processing generators into ruff-format compliance and wrap four over-length lines flagged by the pinned ruff (E501).
This PR refreshes the input data sources for the water module, drops R11 support, and legacy impact representation for cooling module in lieu of PR #479
projections_people_UR_income_10_25.csv: with urban/rural split, usedfor AFR, EEU, FSU, LAM, MEA, PAS, RCPA, SAS, WEU.
projections_people_merge_countries_10_25.csv: no split, used forNAM and CHN with the same rate assigned to both settings.
basin_allocation.pyAdditionally, old files in pre-processing have been removed.
How to review
Go through generator files, to see methodology and assumptions for data imputation.
PR checklist