Using method = "auto" fails if there is missing data due to the type checking#365
Using method = "auto" fails if there is missing data due to the type checking#365jhorzek wants to merge 2 commits intoeasystats:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request fixes a bug where using method = "auto" fails when the dataset contains missing values. The issue occurred in the .vartype() function which performs type checking to determine the appropriate correlation method.
Changes:
- Updated the is_count check to properly handle NA values by adding
any(!is.na(x))guard andna.rm = TRUEparameter to theall()function
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (any(!is.na(x)) && all(x %% 1 == 0, na.rm = TRUE)) { | ||
| out$is_count <- TRUE |
There was a problem hiding this comment.
There appears to be trailing whitespace at the end of these lines. Please remove the trailing spaces to maintain code consistency.
| if (any(!is.na(x)) && all(x %% 1 == 0, na.rm = TRUE)) { | |
| out$is_count <- TRUE | |
| if (any(!is.na(x)) && all(x %% 1 == 0, na.rm = TRUE)) { | |
| out$is_count <- TRUE |
There was a problem hiding this comment.
I reformatted the code with air.
| if (any(!is.na(x)) && all(x %% 1 == 0, na.rm = TRUE)) { | ||
| out$is_count <- TRUE | ||
| } |
There was a problem hiding this comment.
Consider adding a test case for this specific bug fix to ensure that method = "auto" works correctly with missing data. For example, test the scenario mentioned in the PR description where correlation is called with method = "auto" and missing = "keep_pairwise" on data with NA values.
There was a problem hiding this comment.
I've added a very basic test for missing data combined with method = "auto" on continuous data.
The following currently fails:
The issue arises because the utils_find_correlationtype function assumes no missing data in all(x %% 1 == 0). I've replaced this with a basic check that ensures that NAs are ignored.