Add unit tests for benchmarking metrics and document metric_R2 fallback by tanmaydimriGSOC · Pull Request #3888 · PecanProject/pecan

tanmaydimriGSOC · 2026-03-19T19:17:03Z

Summary

While exploring the benchmarking module for my GSoC 2026 proposal (validation framework project), I noticed that the core metric functions (metric_RMSE, metric_MAE, metric_cor, metric_R2) had no unit test coverage. This PR adds tests for all four functions and documents a previously undocumented behavior discovered while writing them.

Changes

tests/testthat/test-metrics.R (new file)

9 unit tests covering metric_RMSE, metric_MAE, metric_cor, and metric_R2
Tests cover: perfect predictions, known values, NA handling, and edge cases
Includes a test that explicitly captures the warning triggered by metric_R2's lm() fallback path

R/metric_R2.R (modified)

Added @details section documenting the silent fallback behavior:
when model output is constant, the correlation formula returns
NA and the function switches to lm()-based R-squared without notifying the user
Added inline comment on the fallback block for future developers
This behavior was discovered through the tests and is relevant
for multi-site validation use cases where silent inconsistency across sites could produce misleading results

Notes

The lm() fallback in metric_R2 triggers an "essentially perfect fit: summary may be unreliable" warning from stats::summary.lm
in edge cases. This is now documented but not changed — a follow-up could add an explicit warning or input check for
constant model output.

Tests were verified locally by sourcing metric functions directly due to dependency installation constraints in local environment.

ayushman1210 · 2026-05-31T09:25:09Z

@dlebauer I've reviewed this PR in the context of my GSoC workplan. This decoupled, functional approach to calculating metrics (specifically metric_R2.R) fits perfectly with the architecture we are building for Phase 2
I suggest we merge this PR as-is.
Once it merged I will use this exact pattern as the template to generalize the remaining metrics (RMSE, MAE, correlation) so they can operate independently of the legacy BETYdb infrastructure. Let's get this merged!

infotroph · 2026-06-01T18:00:00Z

+  PEcAn.logger <- new.env()
+  PEcAn.logger$logger.info <- function(...) invisible(NULL)


Can you explain what's happening here and what it's trying to accomplish? It looks like you may be trying to mask the logger functions, which I do not think will work as expected (they live in their own namespace) and also isn't generally a good idea.

@infotroph I focused my review on the metric functions and the R-squared fallback documentation (which align perfectly with the Phase 2 goals of my GSoC workplan) and I admittedly glossed over that mocking workaround at the top of the test file.
Looking at the PR description, it seems the author added that block to bypass a local environment issue where they couldn't install PEcAn.logger. You are absolutely right that we shouldn't be trying to mask the namespace like this, especially since the package will be available in the standard CI/testing environment anyway.

@tanmaydimriGSOC could you go ahead and remove lines 1-5 in test-metrics.R? The GitHub CI will have the necessary dependencies installed so the tests will run fine without that workaround. Once that is removed the core logic of this PR still looks good to me!!

divine7022

solid first pass on tests; just needs few cleanups

divine7022 · 2026-06-11T16:32:20Z

+##' If this formula returns \code{NA} (e.g. when model output is constant
+##' across all observations), the function silently falls back to an
+##' \code{lm()}-based R-squared via \code{summary(lm())$r.squared}.
+##' This fallback may produce unreliable results and triggers a warning


here "silently falls back" and next sentence says it triggers warning. can't both be true
drop "silently"; rest of the paragraph correctly describes the warning behavior

divine7022 · 2026-06-11T16:54:20Z

@@ -1,23 +1,39 @@
 ##' @name metric_R2
 ##' @title Coefficient of Determination (R2)


title still says "Coefficient of Determination" but formula in @details squared pearson correlation; formula in @details is correct.
change title

divine7022 · 2026-06-11T17:00:17Z

+test_that("metric_R2 silent NA fallback produces valid output", {
+  dat <- data.frame(model = c(2, 2, 2), obvs = c(1, 2, 3))
+  expect_warning(
+    result <- metric_R2(dat),
+    "essentially perfect fit"
+  )
+  expect_true(is.numeric(result))
+})


test name says "silent NA fallback" but the input has no NAs, constant model triggers fallback via 0/0 returning NaN. assertion is.numeric(result) also passes on NaN, which is what the function actually returns here. worth tightening both

divine7022 · 2026-06-11T17:07:18Z

 ##' @author Betsy Cowdery
-
 metric_R2 <- function(metric_dat, ...) {
  PEcAn.logger::logger.info("Metric: Coefficient of Determination (R2)")


also comment here -- https://github.com/PecanProject/pecan/pull/3888/changes#r3397678145

Suggested change

PEcAn.logger::logger.info("Metric: Coefficient of Determination (R2)")

PEcAn.logger::logger.info("Metric: Squared Pearson Correlation (R2)")

divine7022 · 2026-06-11T17:14:05Z

+test_that("metric_RMSE handles NA values", {
+  dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3))
+  expect_true(is.numeric(metric_RMSE(dat)))
+})


is.numeric() passes on NA, NaN, Inf, and 0 indifferently, this test doesn't actually constrain the return value. since metric_RMSE uses na.rm = TRUE, expected value here is 0. worth checking that directly

Suggested change

test_that("metric_RMSE handles NA values", {

dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3))

expect_true(is.numeric(metric_RMSE(dat)))

})

test_that("metric_RMSE handles NA values", {

dat <- data.frame(model = c(1, NA, 3), obvs = c(1, 2, 3))

expect_equal(metric_RMSE(dat), 0)

})

divine7022 · 2026-06-11T17:24:58Z

NA in model triggers lm() fallback, which by default just drops NA row via na.action = na.omit and fits perfect line through whatever's left. so single NA in model series silently returns R2 = 1. no test pins this. wondering if the ryt move here is to handle NAs same way metric_cor and metric_PPMC do, squaring that gives same r2 and lm fallback can go away entirely, since cor() already returns NA for degenerate constant input case; so constant input case returns NA, instead of NaN

something like:

metric_R2 <- function(metric_dat, ...) { PEcAn.logger::logger.info("Metric: Squared Pearson Correlation (R²)") stats::cor(metric_dat$model, metric_dat$obvs, use = "pairwise.complete.obs")^2 }

heads up: if this lands, the last test needs to update too , constant model case would return NA (with "standard deviation is zero" warning) instead of NaN (with "essentially perfect fit" warning)

divine7022 · 2026-06-11T17:33:58Z

filed follow up to track this - #4027

Add unit tests for benchmarking metrics and document metric_R2 fallback

2559a3a

github-actions Bot added tests modules labels Mar 19, 2026

Update metric_R2.Rd to match roxygen documentation

08e9e7b

tanmaydimriGSOC marked this pull request as ready for review April 7, 2026 19:26

dlebauer moved this to Todo in GSOC Benchmarking and Validation May 28, 2026

dlebauer added this to GSOC Benchmarking and Validation May 28, 2026

dlebauer moved this from Todo to Review in GSOC Benchmarking and Validation May 28, 2026

infotroph reviewed Jun 1, 2026

View reviewed changes

divine7022 requested changes Jun 11, 2026

View reviewed changes

divine7022 mentioned this pull request Jun 11, 2026

add testthat coverage for remaining numeric metrics in PEcAn.benchmark #4027

Open

ayushman1210 mentioned this pull request Jun 23, 2026

Add testthat coverage for remaining numeric metrics #4032

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unit tests for benchmarking metrics and document metric_R2 fallback#3888

Add unit tests for benchmarking metrics and document metric_R2 fallback#3888
tanmaydimriGSOC wants to merge 2 commits into
PecanProject:developfrom
tanmaydimriGSOC:add-metric-tests

tanmaydimriGSOC commented Mar 19, 2026

Uh oh!

ayushman1210 commented May 31, 2026

Uh oh!

infotroph Jun 1, 2026

Uh oh!

ayushman1210 Jun 1, 2026

Uh oh!

ayushman1210 Jun 1, 2026

Uh oh!

divine7022 left a comment

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 Jun 11, 2026

Uh oh!

divine7022 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		PEcAn.logger <- new.env()
		PEcAn.logger$logger.info <- function(...) invisible(NULL)

		@@ -1,23 +1,39 @@
		##' @name metric_R2
		##' @title Coefficient of Determination (R2)

	PEcAn.logger::logger.info("Metric: Coefficient of Determination (R2)")
	PEcAn.logger::logger.info("Metric: Squared Pearson Correlation (R2)")

Uh oh!

Conversation

tanmaydimriGSOC commented Mar 19, 2026

Summary

Changes

Notes

Uh oh!

ayushman1210 commented May 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divine7022 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divine7022 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants