Skip to content

Add unit tests for benchmarking metrics and document metric_R2 fallback#3888

Open
tanmaydimriGSOC wants to merge 2 commits into
PecanProject:developfrom
tanmaydimriGSOC:add-metric-tests
Open

Add unit tests for benchmarking metrics and document metric_R2 fallback#3888
tanmaydimriGSOC wants to merge 2 commits into
PecanProject:developfrom
tanmaydimriGSOC:add-metric-tests

Conversation

@tanmaydimriGSOC

Copy link
Copy Markdown

Summary

While exploring the benchmarking module for my GSoC 2026 proposal (validation framework project), I noticed that the core metric functions (metric_RMSE, metric_MAE, metric_cor, metric_R2) had no unit test coverage. This PR adds tests for all four functions and documents a previously undocumented behavior discovered while writing them.

Changes

tests/testthat/test-metrics.R (new file)

  • 9 unit tests covering metric_RMSE, metric_MAE, metric_cor, and metric_R2
  • Tests cover: perfect predictions, known values, NA handling, and edge cases
  • Includes a test that explicitly captures the warning triggered by metric_R2's lm() fallback path

R/metric_R2.R (modified)

  • Added @details section documenting the silent fallback behavior:
    when model output is constant, the correlation formula returns
    NA and the function switches to lm()-based R-squared without notifying the user
  • Added inline comment on the fallback block for future developers
  • This behavior was discovered through the tests and is relevant
    for multi-site validation use cases where silent inconsistency across sites could produce misleading results

Notes

The lm() fallback in metric_R2 triggers an "essentially perfect fit: summary may be unreliable" warning from stats::summary.lm
in edge cases. This is now documented but not changed — a follow-up could add an explicit warning or input check for
constant model output.

Tests were verified locally by sourcing metric functions directly due to dependency installation constraints in local environment.

@tanmaydimriGSOC tanmaydimriGSOC marked this pull request as ready for review April 7, 2026 19:26
@dlebauer dlebauer moved this from Todo to Review in GSOC Benchmarking and Validation May 28, 2026
@ayushman1210

Copy link
Copy Markdown
Contributor

@dlebauer I've reviewed this PR in the context of my GSoC workplan. This decoupled, functional approach to calculating metrics (specifically metric_R2.R) fits perfectly with the architecture we are building for Phase 2
I suggest we merge this PR as-is.
Once it merged I will use this exact pattern as the template to generalize the remaining metrics (RMSE, MAE, correlation) so they can operate independently of the legacy BETYdb infrastructure. Let's get this merged!

Comment on lines +3 to +4
PEcAn.logger <- new.env()
PEcAn.logger$logger.info <- function(...) invisible(NULL)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what's happening here and what it's trying to accomplish? It looks like you may be trying to mask the logger functions, which I do not think will work as expected (they live in their own namespace) and also isn't generally a good idea.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@infotroph I focused my review on the metric functions and the R-squared fallback documentation (which align perfectly with the Phase 2 goals of my GSoC workplan) and I admittedly glossed over that mocking workaround at the top of the test file.
Looking at the PR description, it seems the author added that block to bypass a local environment issue where they couldn't install PEcAn.logger. You are absolutely right that we shouldn't be trying to mask the namespace like this, especially since the package will be available in the standard CI/testing environment anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanmaydimriGSOC could you go ahead and remove lines 1-5 in test-metrics.R? The GitHub CI will have the necessary dependencies installed so the tests will run fine without that workaround. Once that is removed the core logic of this PR still looks good to me!!

@divine7022 divine7022 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solid first pass on tests; just needs few cleanups

Comment on lines +12 to +15
##' If this formula returns \code{NA} (e.g. when model output is constant
##' across all observations), the function silently falls back to an
##' \code{lm()}-based R-squared via \code{summary(lm())$r.squared}.
##' This fallback may produce unreliable results and triggers a warning

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here "silently falls back" and next sentence says it triggers warning. can't both be true
drop "silently"; rest of the paragraph correctly describes the warning behavior

@@ -1,23 +1,39 @@
##' @name metric_R2
##' @title Coefficient of Determination (R2)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

title still says "Coefficient of Determination" but formula in @details squared pearson correlation; formula in @details is correct.
change title

Comment on lines +42 to +49
test_that("metric_R2 silent NA fallback produces valid output", {
dat <- data.frame(model = c(2, 2, 2), obvs = c(1, 2, 3))
expect_warning(
result <- metric_R2(dat),
"essentially perfect fit"
)
expect_true(is.numeric(result))
}) No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test name says "silent NA fallback" but the input has no NAs, constant model triggers fallback via 0/0 returning NaN. assertion is.numeric(result) also passes on NaN, which is what the function actually returns here. worth tightening both

##' @author Betsy Cowdery

metric_R2 <- function(metric_dat, ...) {
PEcAn.logger::logger.info("Metric: Coefficient of Determination (R2)")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also comment here -- https://github.com/PecanProject/pecan/pull/3888/changes#r3397678145

Suggested change
PEcAn.logger::logger.info("Metric: Coefficient of Determination (R2)")
PEcAn.logger::logger.info("Metric: Squared Pearson Correlation (R2)")

Comment on lines +12 to +15
test_that("metric_RMSE handles NA values", {
dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3))
expect_true(is.numeric(metric_RMSE(dat)))
})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is.numeric() passes on NA, NaN, Inf, and 0 indifferently, this test doesn't actually constrain the return value. since metric_RMSE uses na.rm = TRUE, expected value here is 0. worth checking that directly

Suggested change
test_that("metric_RMSE handles NA values", {
dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3))
expect_true(is.numeric(metric_RMSE(dat)))
})
test_that("metric_RMSE handles NA values", {
dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3))
expect_equal(metric_RMSE(dat), 0)
})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NA in model triggers lm() fallback, which by default just drops NA row via na.action = na.omit and fits perfect line through whatever's left. so single NA in model series silently returns R2 = 1. no test pins this. wondering if the ryt move here is to handle NAs same way metric_cor and metric_PPMC do, squaring that gives same r2 and lm fallback can go away entirely, since cor() already returns NA for degenerate constant input case; so constant input case returns NA, instead of NaN

something like:

metric_R2 <- function(metric_dat, ...) {
  PEcAn.logger::logger.info("Metric: Squared Pearson Correlation (R²)")
  stats::cor(metric_dat$model, metric_dat$obvs, use = "pairwise.complete.obs")^2
}

heads up: if this lands, the last test needs to update too , constant model case would return NA (with "standard deviation is zero" warning) instead of NaN (with "essentially perfect fit" warning)

@divine7022

Copy link
Copy Markdown
Member

filed follow up to track this - #4027

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

5 participants