Add unit tests for benchmarking metrics and document metric_R2 fallback#3888
Add unit tests for benchmarking metrics and document metric_R2 fallback#3888tanmaydimriGSOC wants to merge 2 commits into
Conversation
|
@dlebauer I've reviewed this PR in the context of my GSoC workplan. This decoupled, functional approach to calculating metrics (specifically metric_R2.R) fits perfectly with the architecture we are building for Phase 2 |
| PEcAn.logger <- new.env() | ||
| PEcAn.logger$logger.info <- function(...) invisible(NULL) |
There was a problem hiding this comment.
Can you explain what's happening here and what it's trying to accomplish? It looks like you may be trying to mask the logger functions, which I do not think will work as expected (they live in their own namespace) and also isn't generally a good idea.
There was a problem hiding this comment.
@infotroph I focused my review on the metric functions and the R-squared fallback documentation (which align perfectly with the Phase 2 goals of my GSoC workplan) and I admittedly glossed over that mocking workaround at the top of the test file.
Looking at the PR description, it seems the author added that block to bypass a local environment issue where they couldn't install PEcAn.logger. You are absolutely right that we shouldn't be trying to mask the namespace like this, especially since the package will be available in the standard CI/testing environment anyway.
There was a problem hiding this comment.
@tanmaydimriGSOC could you go ahead and remove lines 1-5 in test-metrics.R? The GitHub CI will have the necessary dependencies installed so the tests will run fine without that workaround. Once that is removed the core logic of this PR still looks good to me!!
divine7022
left a comment
There was a problem hiding this comment.
solid first pass on tests; just needs few cleanups
| ##' If this formula returns \code{NA} (e.g. when model output is constant | ||
| ##' across all observations), the function silently falls back to an | ||
| ##' \code{lm()}-based R-squared via \code{summary(lm())$r.squared}. | ||
| ##' This fallback may produce unreliable results and triggers a warning |
There was a problem hiding this comment.
here "silently falls back" and next sentence says it triggers warning. can't both be true
drop "silently"; rest of the paragraph correctly describes the warning behavior
| @@ -1,23 +1,39 @@ | |||
| ##' @name metric_R2 | |||
| ##' @title Coefficient of Determination (R2) | |||
| test_that("metric_R2 silent NA fallback produces valid output", { | ||
| dat <- data.frame(model = c(2, 2, 2), obvs = c(1, 2, 3)) | ||
| expect_warning( | ||
| result <- metric_R2(dat), | ||
| "essentially perfect fit" | ||
| ) | ||
| expect_true(is.numeric(result)) | ||
| }) No newline at end of file |
There was a problem hiding this comment.
test name says "silent NA fallback" but the input has no NAs, constant model triggers fallback via 0/0 returning NaN. assertion is.numeric(result) also passes on NaN, which is what the function actually returns here. worth tightening both
| ##' @author Betsy Cowdery | ||
|
|
||
| metric_R2 <- function(metric_dat, ...) { | ||
| PEcAn.logger::logger.info("Metric: Coefficient of Determination (R2)") |
There was a problem hiding this comment.
also comment here -- https://github.com/PecanProject/pecan/pull/3888/changes#r3397678145
| PEcAn.logger::logger.info("Metric: Coefficient of Determination (R2)") | |
| PEcAn.logger::logger.info("Metric: Squared Pearson Correlation (R2)") |
| test_that("metric_RMSE handles NA values", { | ||
| dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3)) | ||
| expect_true(is.numeric(metric_RMSE(dat))) | ||
| }) |
There was a problem hiding this comment.
is.numeric() passes on NA, NaN, Inf, and 0 indifferently, this test doesn't actually constrain the return value. since metric_RMSE uses na.rm = TRUE, expected value here is 0. worth checking that directly
| test_that("metric_RMSE handles NA values", { | |
| dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3)) | |
| expect_true(is.numeric(metric_RMSE(dat))) | |
| }) | |
| test_that("metric_RMSE handles NA values", { | |
| dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3)) | |
| expect_equal(metric_RMSE(dat), 0) | |
| }) |
There was a problem hiding this comment.
NA in model triggers lm() fallback, which by default just drops NA row via na.action = na.omit and fits perfect line through whatever's left. so single NA in model series silently returns R2 = 1. no test pins this. wondering if the ryt move here is to handle NAs same way metric_cor and metric_PPMC do, squaring that gives same r2 and lm fallback can go away entirely, since cor() already returns NA for degenerate constant input case; so constant input case returns NA, instead of NaN
something like:
metric_R2 <- function(metric_dat, ...) {
PEcAn.logger::logger.info("Metric: Squared Pearson Correlation (R²)")
stats::cor(metric_dat$model, metric_dat$obvs, use = "pairwise.complete.obs")^2
}
heads up: if this lands, the last test needs to update too , constant model case would return NA (with "standard deviation is zero" warning) instead of NaN (with "essentially perfect fit" warning)
|
filed follow up to track this - #4027 |
Summary
While exploring the benchmarking module for my GSoC 2026 proposal (validation framework project), I noticed that the core metric functions (metric_RMSE, metric_MAE, metric_cor, metric_R2) had no unit test coverage. This PR adds tests for all four functions and documents a previously undocumented behavior discovered while writing them.
Changes
tests/testthat/test-metrics.R (new file)
R/metric_R2.R (modified)
when model output is constant, the correlation formula returns
NA and the function switches to lm()-based R-squared without notifying the user
for multi-site validation use cases where silent inconsistency across sites could produce misleading results
Notes
The lm() fallback in metric_R2 triggers an "essentially perfect fit: summary may be unreliable" warning from stats::summary.lm
in edge cases. This is now documented but not changed — a follow-up could add an explicit warning or input check for
constant model output.
Tests were verified locally by sourcing metric functions directly due to dependency installation constraints in local environment.