Skip to content

Add RefChecker logic for reference validation#15478

Open
NishantDG-SST wants to merge 5 commits intoJabRef:mainfrom
NishantDG-SST:13604-refchecker
Open

Add RefChecker logic for reference validation#15478
NishantDG-SST wants to merge 5 commits intoJabRef:mainfrom
NishantDG-SST:13604-refchecker

Conversation

@NishantDG-SST
Copy link
Copy Markdown
Contributor

@NishantDG-SST NishantDG-SST commented Apr 1, 2026

Related issues and pull requests

Refs #13604

PR Description

Only logic for the RefCheck functionality.

Steps to test

Run

./gradlew :jablib:fetcherTest --tests "org.jabref.logic.refcheck.RefCheckerTest"

Checklist

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • [/] I added screenshots in the PR description (if change is visible to the user)
  • [/] I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
  • I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
  • [/] I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

Copilot AI review requested due to automatic review settings April 1, 2026 22:22
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Add RefChecker logic for reference validation

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add RefChecker logic for validating bibliographic entries against online sources
• Implement entry comparison using DOI, CrossRef, and arXiv fetchers
• Add compareEntries method to DuplicateCheck for similarity scoring
• Introduce RefValidity enum with REAL, UNSURE, and FAKE classifications
Diagram
flowchart LR
  Entry["BibEntry to validate"]
  DOI["DOI Lookup"]
  CrossRef["CrossRef Discovery"]
  ArXiv["ArXiv Lookup"]
  Compare["compareEntries Similarity"]
  Result["RefCheckResult with validity"]
  
  Entry --> DOI
  Entry --> CrossRef
  Entry --> ArXiv
  DOI --> Compare
  CrossRef --> Compare
  ArXiv --> Compare
  Compare --> Result
Loading

Grey Divider

File Changes

1. jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java ✨ Enhancement +43/-0

Add entry comparison method with similarity scoring

• Add COMPARE_ENTRIES_THRESHOLD constant (0.8) for similarity scoring
• Implement compareEntries static method to compute similarity between entries
• Filter out internal fields from comparison to avoid citation key bias
• Return normalized similarity score in range [0.0, 1.0]

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java


2. jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java ✨ Enhancement +225/-0

Implement reference validation against online sources

• Create RefChecker class for validating entries against online sources
• Implement three-stage validation: DOI lookup, CrossRef discovery, arXiv lookup
• Add check method that returns best result from all sources
• Implement classification logic using similarity thresholds
• Add error handling with logging for fetcher failures

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java


3. jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java ✨ Enhancement +7/-0

Define reference validity classification enum

• Create RefValidity enum with three states: REAL, UNSURE, FAKE
• Represents classification of bibliographic entry validity

jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java


View more (4)
4. jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java ✨ Enhancement +8/-0

Define reference check result record

• Create RefCheckResult record to encapsulate validation results
• Include validity classification, matched entry, and similarity score

jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java


5. jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java 🧪 Tests +58/-0

Add tests for entry comparison method

• Add five test cases for compareEntries method
• Test scenarios: no shared fields, identical entries, citation key ignoring, identical titles,
 different fields
• Verify similarity scoring and threshold behavior

jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java


6. jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java 🧪 Tests +71/-0

Add integration tests for RefChecker

• Create RefCheckerTest with three integration test cases
• Test real paper validation with correct DOI
• Test rejection of entries with correct DOI but wrong metadata
• Test classification of nonexistent entries as fake

jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java


7. CHANGELOG.md 📝 Documentation +1/-1

Document RefChecker feature in changelog

• Add entry documenting RefChecker logic addition
• Reference issue #13604
• Describe validation against DOI, CrossRef, and arXiv sources

CHANGELOG.md


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented Apr 1, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (3) 📎 Requirement gaps (0)

Grey Divider


Action required

1. LOGGER.warn drops stack traces📘 Rule violation ≡ Correctness
Description
The new exception logging uses e.getMessage() and does not pass the Throwable to the logger,
losing stack traces and structured exception data. This makes diagnosing fetch failures
significantly harder and violates the project’s exception logging requirement.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R100-165]

+        } catch (FetcherException e) {
+            LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (found.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        return classify(entry, found.get());
+    }
+
+    /// Tries to validate the entry by first discovering a DOI through CrossRef,
+    /// then fetching the authoritative entry using that DOI.
+    /// Returns a FAKE result with no matched entry if CrossRef finds no DOI
+    private RefCheckResult checkByCrossRef(BibEntry entry) {
+        Optional<DOI> foundDoi;
+        try {
+            foundDoi = crossRef.findIdentifier(entry);
+        } catch (FetcherException e) {
+            LOGGER.warn("CrossRef lookup failed: {}", e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (foundDoi.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        Optional<BibEntry> found;
+        try {
+            found = doiFetcher.performSearchById(foundDoi.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("DOI fetch after CrossRef discovery failed for {}: {}", foundDoi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (found.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        return classify(entry, found.get());
+    }
+
+    /// Tries to validate the entry using arXiv.
+    /// First looks for an arXiv identifier, then fetches the full entry by that identifier.
+    /// Returns a FAKE result with no matched entry if no arXiv identifier is found
+    private RefCheckResult checkByArXiv(BibEntry entry) {
+        Optional<ArXivIdentifier> identifier;
+        try {
+            identifier = arXivFetcher.findIdentifier(entry);
+        } catch (FetcherException e) {
+            LOGGER.warn("arXiv identifier lookup failed: {}", e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (identifier.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        Optional<BibEntry> found;
+        try {
+            found = arXivFetcher.performSearchById(identifier.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("arXiv fetch failed for {}: {}", identifier.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
Evidence
PR Compliance ID 16 requires logging the exception object as the last argument; the added
LOGGER.warn(...) calls only log e.getMessage() and never pass e, so stack traces are not
recorded.

AGENTS.md
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[100-165]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefChecker` catches `FetcherException` but logs only `e.getMessage()` instead of logging the actual `Throwable`, which drops stack traces.
## Issue Context
Per logging rules, exceptions must be passed to the logger as the last argument (e.g., `LOGGER.warn("...", value, e)`).
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[100-165]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. compareEntries uses orElse("")📘 Rule violation ⚙ Maintainability
Description
DuplicateCheck.compareEntries uses Optional.orElse(""), which is explicitly discouraged and can
hide missing-value behavior. Since commonFields already filters for fields present in both
entries, the orElse defaults are unnecessary and non-idiomatic.
Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[R366-382]

+        List<Field> commonFields = one.getFields().stream()
+                                      .filter(field -> !(field instanceof InternalField))
+                                      .filter(field -> two.getField(field).isPresent())
+                                      .toList();
+
+        if (commonFields.isEmpty()) {
+            return 0.0;
+        }
+
+        return commonFields.stream()
+                           .mapToDouble(field -> {
+                               String firstValue = one.getField(field).orElse("");
+                               String secondValue = two.getField(field).orElse("");
+                               return stringSimilarity.similarity(firstValue, secondValue);
+                           })
+                           .average()
+                           .orElse(0.0);
Evidence
PR Compliance ID 10 forbids the Optional.orElse("") anti-pattern; the new code adds two
orElse("") defaults when reading fields for similarity scoring.

AGENTS.md
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[366-382]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuplicateCheck.compareEntries` uses `Optional.orElse("")`, which is disallowed and unnecessary here because `commonFields` already ensures both entries contain the field.
## Issue Context
Replace the `orElse("")` calls with direct retrieval that reflects the guaranteed presence (or restructure using `Optional` APIs) to comply with Optional handling conventions.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[366-382]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Weak threshold asserts in tests 📘 Rule violation ☼ Reliability
Description
New tests use predicate assertions (assertTrue(score >= threshold) / < threshold) instead of
asserting exact expected values, weakening regression detection. This violates the unit test
requirement to assert exact values/outputs where possible.
Code

jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[R665-687]

+    @Test
+    void entriesWithIdenticalTitles() {
+        BibEntry one = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+        BibEntry two = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score >= DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }
+
+    @Test
+    void entriesWithCompletelyDifferentFields() {
+        BibEntry one = new BibEntry()
+                .withField(StandardField.TITLE, "Performance on a Signal")
+                .withField(StandardField.AUTHOR, "Richard Atkinson");
+        BibEntry two = new BibEntry()
+                .withField(StandardField.TITLE, "Rest in Treatment")
+                .withField(StandardField.AUTHOR, "Elizabeth Ballard");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score < DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }
Evidence
PR Compliance ID 29 requires tests to assert exact expected values and avoid weak predicate checks;
the added tests compare with assertTrue(...) rather than assertEquals(...) to expected numeric
results (or another exact structured expectation).

jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The added `DuplicateCheck.compareEntries` tests use threshold-based `assertTrue` predicates, which are considered weak checks.
## Issue Context
Update the tests to assert exact expected values (or exact expected structures) to strengthen regression detection.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (4)
4. assertNotEquals weakens RefChecker test 📘 Rule violation ☼ Reliability
Description
The test only asserts the result is "not REAL" via assertNotEquals, which is a weak predicate and
can pass for multiple unintended outcomes. The test should assert the exact expected RefValidity
(or a complete expected result shape) to meet unit test strength requirements.
Code

jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[R47-58]

+    @Test
+    void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
+        BibEntry entry = new BibEntry(StandardEntryType.Article)
+                .withField(StandardField.TITLE, "Not a Real Paper")
+                .withField(StandardField.AUTHOR, "Random Author")
+                .withField(StandardField.YEAR, "2099")
+                .withField(StandardField.DOI, "10.48550/arXiv.1706.03762");
+
+        RefCheckResult result = refChecker.check(entry);
+
+        assertNotEquals(RefValidity.REAL, result.validity());
+    }
Evidence
PR Compliance ID 29 requires exact expected-value assertions and discourages weak predicates;
assertNotEquals(RefValidity.REAL, ...) does not assert the intended classification outcome.

jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefCheckerTest.entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal` uses `assertNotEquals(REAL, ...)`, which is a weak predicate check.
## Issue Context
Change the assertion to an exact expected validity (or assert the full expected `RefCheckResult` properties) so the test fails on near-miss behavior changes.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. AI/LLM comment in code📘 Rule violation ⚙ Maintainability
Description
A new source-code comment references LLM usage (No LLM is used.), which is meta-process/AI
disclosure content that should not be embedded in code. This violates the rule prohibiting
AI-disclosure comments inside source files.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R30-31]

+/// No LLM is used. Classification is based entirely on
+/// [org.jabref.logic.database.DuplicateCheck#compareEntries].
Evidence
PR Compliance ID 26 prohibits AI/LLM disclosure comments inside source code; the added class
Javadoc-style comment explicitly mentions LLM usage.

AGENTS.md
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[30-31]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The `RefChecker` class comment includes AI/LLM disclosure text (`No LLM is used.`), which is not allowed in source code comments.
## Issue Context
Remove or rephrase the comment to describe behavior without referencing AI/LLM usage.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[30-31]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Unnormalized field similarity🐞 Bug ≡ Correctness
Description
DuplicateCheck.compareEntries compares raw field values via StringSimilarity.similarity, bypassing
JabRef’s existing LaTeX-free and field-specific normalization (e.g., author/page handling), which
can lower similarity for semantically identical entries and misclassify RefChecker results.
Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[R363-382]

+    public static double compareEntries(BibEntry one, BibEntry two) {
+        StringSimilarity stringSimilarity = new StringSimilarity();
+
+        List<Field> commonFields = one.getFields().stream()
+                                      .filter(field -> !(field instanceof InternalField))
+                                      .filter(field -> two.getField(field).isPresent())
+                                      .toList();
+
+        if (commonFields.isEmpty()) {
+            return 0.0;
+        }
+
+        return commonFields.stream()
+                           .mapToDouble(field -> {
+                               String firstValue = one.getField(field).orElse("");
+                               String secondValue = two.getField(field).orElse("");
+                               return stringSimilarity.similarity(firstValue, secondValue);
+                           })
+                           .average()
+                           .orElse(0.0);
Evidence
The new compareEntries uses BibEntry.getField (raw stored strings) and Levenshtein-based similarity
on those raw values. Existing duplicate logic intentionally uses getFieldLatexFree plus special
handling for PERSON_NAMES/PAGES/JOURNAL/CHAPTER, so the new scoring will diverge and can under-score
entries that differ only in formatting/LaTeX/author formatting.

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[345-383]
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[163-210]
jablib/src/main/java/org/jabref/logic/util/strings/StringSimilarity.java[30-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuplicateCheck.compareEntries` currently computes similarity using raw field strings (`getField`) and generic Levenshtein similarity. This bypasses JabRef’s existing LaTeX-free extraction and field-specific comparison rules (authors/pages/journal, etc.), leading to systematically wrong similarity scores and therefore wrong `RefChecker` classifications.
### Issue Context
Existing duplicate detection already contains proven normalization and comparison logic in `compareSingleField` (LaTeX-free, PERSON_NAMES handling, page normalization, etc.). `compareEntries` should reuse these normalization steps (at least) so that formatting differences don’t dominate the score.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[345-383]
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[163-210]
### Implementation notes
- Use `getFieldLatexFree` instead of `getField` (or normalize via the same helper methods used in `compareSingleField`).
- Consider applying field-specific preprocessing: PERSON_NAMES -> correlateByWords-like handling; pages/journal normalization similar to existing methods.
- Keep internal fields excluded as today.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Network errors labeled FAKE🐞 Bug ☼ Reliability
Description
RefChecker catches FetcherException (network/server/parse failures) and returns RefValidity.FAKE,
incorrectly classifying “could not validate due to error” as “reference is fake”.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R98-103]

+        try {
+            found = doiFetcher.performSearchById(doi.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
Evidence
FetcherException is thrown by fetcher infrastructure specifically for I/O/network and parser
failures. RefChecker converts these exceptions into FAKE results with score 0.0, which is
semantically indistinguishable from a real negative validation outcome and can produce false
negatives when services are temporarily unavailable.

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[97-103]
jablib/src/main/java/org/jabref/logic/importer/EntryBasedParserFetcher.java[31-55]
jablib/src/main/java/org/jabref/logic/importer/IdParserFetcher.java[43-72]
jablib/src/main/java/org/jabref/logic/importer/FetcherException.java[16-91]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefChecker` currently maps any `FetcherException` (network/server/parse errors) to `RefValidity.FAKE`. This produces false negatives: entries can be marked FAKE due to transient outages or rate limiting.
### Issue Context
Fetcher infrastructure throws `FetcherException` for network and parsing failures. These are not evidence the reference is fake; they indicate the check is inconclusive.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[89-172]
- jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java[1-7]
### Implementation notes
- Introduce an explicit state (e.g., `UNKNOWN`/`ERROR`) in `RefValidity`, or return `UNSURE` with a dedicated flag/error payload.
- Ensure `bestOf` ranking accounts for the new state.
- Update/add tests to cover error paths (e.g., mock fetchers throwing `FetcherException`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

8. Trivial threshold inline comment 📘 Rule violation ⚙ Maintainability
Description
The new inline comment on COMPARE_ENTRIES_THRESHOLD restates what the constant name already
expresses and does not explain additional rationale. This adds noise and violates the requirement
that comments explain "why" rather than "what".
Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38]

+    public static final double COMPARE_ENTRIES_THRESHOLD = 0.8; // The threshold that determines if entries are likely to be of the same publication
Evidence
PR Compliance ID 7 disallows trivial comments that merely restate code; the added inline comment
repeats the meaning of COMPARE_ENTRIES_THRESHOLD without providing extra intent/rationale.

AGENTS.md
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38-38]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The inline comment on `COMPARE_ENTRIES_THRESHOLD` is trivial and restates the identifier.
## Issue Context
Either remove the comment or replace it with a short rationale explaining *why* the specific value (`0.8`) was chosen.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38-38]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Changelog entry too technical📘 Rule violation ⚙ Maintainability
Description
The new CHANGELOG entry is programmer-focused (RefChecker logic) and lacks the end-user
framing/style expected for release notes. This reduces clarity for average users and may not meet
changelog quality requirements.
Code

CHANGELOG.md[13]

+- We added RefChecker logic to validate entries against DOI, CrossRef, and arXiv sources [#13604](https://github.com/JabRef/jabref/issues/13604)
Evidence
PR Compliance ID 25 requires end-user understandable changelog entries, and PR Compliance ID 32
requires professional, precise wording; the added entry uses internal implementation naming rather
than user-facing wording.

AGENTS.md
CHANGELOG.md[13-13]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new changelog bullet references internal implementation details ("RefChecker logic") instead of describing the user-visible feature.
## Issue Context
Rewrite the entry to describe what users can do/what improved, using consistent changelog style.
## Fix Focus Areas
- CHANGELOG.md[13-13]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


10. Tie loses matched entry 🐞 Bug ≡ Correctness
Description
RefChecker.bestOf breaks ties only by validity rank and similarityScore, so when those are equal it
can keep a result with otherEntry=null even if another attempt found an authoritative entry,
dropping useful diagnostics.
Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R195-209]

+    private static RefCheckResult bestOf(RefCheckResult doiResult, RefCheckResult crossRefResult,
+                                         RefCheckResult arXivResult) {
+        RefCheckResult best = doiResult;
+
+        if (rank(crossRefResult) > rank(best)
+                || (rank(crossRefResult) == rank(best)
+                && crossRefResult.similarityScore() > best.similarityScore())) {
+            best = crossRefResult;
+        }
+
+        if (rank(arXivResult) > rank(best)
+                || (rank(arXivResult) == rank(best)
+                && arXivResult.similarityScore() > best.similarityScore())) {
+            best = arXivResult;
+        }
Evidence
bestOf only replaces the current best if rank is greater or similarityScore is strictly greater.
Since some paths return FAKE with otherEntry=null (e.g., missing DOI/identifier) and others can
return FAKE/UNSURE with otherEntry present, an equal-rank/equal-score tie will retain the earlier
result and may discard a found candidate entry.

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[189-212]
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[89-107]
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[174-187]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`bestOf` ignores whether `otherEntry` is present when breaking ties. This can return a result with `otherEntry == null` even though another attempt found a candidate authoritative entry.
### Issue Context
The UI/consumer likely benefits from seeing the closest authoritative entry even when validity is FAKE/UNSURE. The tie-breaker should preserve richer results.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[189-212]
### Implementation notes
- When `rank(a) == rank(b)` and `similarityScore` is equal, prefer the result with non-null `otherEntry`.
- Optionally compute `rank(...)` once per result to simplify logic.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java
Comment thread jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java Outdated
Comment on lines +665 to +687
@Test
void entriesWithIdenticalTitles() {
BibEntry one = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
BibEntry two = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");

double score = DuplicateCheck.compareEntries(one, two);

assertTrue(score >= DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
}

@Test
void entriesWithCompletelyDifferentFields() {
BibEntry one = new BibEntry()
.withField(StandardField.TITLE, "Performance on a Signal")
.withField(StandardField.AUTHOR, "Richard Atkinson");
BibEntry two = new BibEntry()
.withField(StandardField.TITLE, "Rest in Treatment")
.withField(StandardField.AUTHOR, "Elizabeth Ballard");

double score = DuplicateCheck.compareEntries(one, two);

assertTrue(score < DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Weak threshold asserts in tests 📘 Rule violation ☼ Reliability

New tests use predicate assertions (assertTrue(score >= threshold) / < threshold) instead of
asserting exact expected values, weakening regression detection. This violates the unit test
requirement to assert exact values/outputs where possible.
Agent Prompt
## Issue description
The added `DuplicateCheck.compareEntries` tests use threshold-based `assertTrue` predicates, which are considered weak checks.

## Issue Context
Update the tests to assert exact expected values (or exact expected structures) to strengthen regression detection.

## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +47 to +58
@Test
void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Not a Real Paper")
.withField(StandardField.AUTHOR, "Random Author")
.withField(StandardField.YEAR, "2099")
.withField(StandardField.DOI, "10.48550/arXiv.1706.03762");

RefCheckResult result = refChecker.check(entry);

assertNotEquals(RefValidity.REAL, result.validity());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. assertnotequals weakens refchecker test 📘 Rule violation ☼ Reliability

The test only asserts the result is "not REAL" via assertNotEquals, which is a weak predicate and
can pass for multiple unintended outcomes. The test should assert the exact expected RefValidity
(or a complete expected result shape) to meet unit test strength requirements.
Agent Prompt
## Issue description
`RefCheckerTest.entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal` uses `assertNotEquals(REAL, ...)`, which is a weak predicate check.

## Issue Context
Change the assertion to an exact expected validity (or assert the full expected `RefCheckResult` properties) so the test fails on near-miss behavior changes.

## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java Outdated
Comment thread jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java Outdated
Comment thread jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces initial “RefChecker” logic in jablib to validate bibliographic entries by resolving them via DOI/CrossRef/arXiv and classifying them based on similarity to fetched authoritative metadata.

Changes:

  • Added new refcheck domain types (RefChecker, RefCheckResult, RefValidity) and online-validation flow.
  • Added DuplicateCheck.compareEntries(...) plus a shared threshold constant to support similarity-based validation.
  • Added initial integration-style tests for RefChecker and extended DuplicateCheckTest; updated CHANGELOG.md.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java Implements the online lookup + classification flow (DOI → CrossRef → arXiv) and picks the best result.
jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java Adds a result record carrying validity, optional matched entry, and similarity score.
jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java Defines the classification enum (REAL/UNSURE/FAKE).
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java Adds similarity scoring (compareEntries) and a threshold constant used by refcheck.
jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java Adds initial fetcher-backed tests covering “real”, “not real”, and “nonexistent” cases.
jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java Adds unit tests for compareEntries behavior (self-compare, internal field ignore, etc.).
CHANGELOG.md Documents the addition of RefChecker logic.

Comment thread jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java
Comment thread jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java
Comment on lines +34 to +70
@Test
void realPaperWithCorrectDoiIsClassifiedAsReal() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Attention Is All You Need")
.withField(StandardField.AUTHOR, "Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and others")
.withField(StandardField.YEAR, "2017")
.withField(StandardField.DOI, "10.48550/arXiv.1706.03762");

RefCheckResult result = refChecker.check(entry);

assertEquals(RefValidity.REAL, result.validity());
}

@Test
void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Not a Real Paper")
.withField(StandardField.AUTHOR, "Random Author")
.withField(StandardField.YEAR, "2099")
.withField(StandardField.DOI, "10.48550/arXiv.1706.03762");

RefCheckResult result = refChecker.check(entry);

assertNotEquals(RefValidity.REAL, result.validity());
}

@Test
void entryThatDoesNotExistAnywhereIsClassifiedAsFake() {
BibEntry entry = new BibEntry(StandardEntryType.Article)
.withField(StandardField.TITLE, "Nonexistent Paper with no Database")
.withField(StandardField.AUTHOR, "No Author")
.withField(StandardField.YEAR, "1800");

RefCheckResult result = refChecker.check(entry);

assertEquals(RefValidity.FAKE, result.validity());
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are integration-style and depend on live responses from external services (CrossRef/DOI/arXiv). That makes them prone to flakiness when metadata formatting or search results change (especially the “does not exist anywhere” case, where CrossRef could still return a fuzzy match). Prefer a deterministic unit test by injecting mocked DoiFetcher/ArXivFetcher/CrossRef via the 3-arg RefChecker constructor and asserting on controlled responses.

Copilot uses AI. Check for mistakes.
Comment on lines +377 to +378
String firstValue = one.getField(field).orElse("");
String secondValue = two.getField(field).orElse("");
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareEntries uses getField(...) (raw field content) which can penalize harmless formatting differences (LaTeX braces/escaping, whitespace, line breaks) and lead to false FAKE/UNSURE classifications. Since DuplicateCheck already normalizes via getFieldLatexFree(...) in its comparison logic, consider using latex-free/normalized values here as well to keep scoring consistent with the rest of the duplicate-checking implementation.

Suggested change
String firstValue = one.getField(field).orElse("");
String secondValue = two.getField(field).orElse("");
String firstValue = one.getFieldLatexFree(field).orElse("");
String secondValue = two.getFieldLatexFree(field).orElse("");

Copilot uses AI. Check for mistakes.
Comment thread jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java Outdated
Comment on lines +86 to +87
return bestOf(doiResult, crossRefResult, arXivResult);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If none of the sources yields any candidate (otherEntry == null / score 0.0), bestOf(...) currently returns FAKE (because each lookup returns FAKE on “not found”). That conflates “not found / could not verify” with “verified mismatch” and can mislabel obscure/older but real publications as fake. Consider returning UNSURE when no authoritative candidate was found from any source, and reserving FAKE for the case where a candidate exists but similarity is low.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intended behavior is:

  • no candidate found -> FAKE
  • fetch failure -> FAKE
  • low similarity candidate -> FAKE

then I think it’s consistent as it is now,
probably don’t need to change the “not found / could not verify” to UNSURE.

From the classify(...) comments, it seems like this is already the intended behavior.

@koppor do you think this is fine as-is, or do you see this as being too broad semantically?

@github-actions github-actions Bot added status: changes-required Pull requests that are not yet complete status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Apr 1, 2026
@wanling0000
Copy link
Copy Markdown
Collaborator

Hi @NishantDG-SST, I tried this locally and the tests pass on my side.

From a quick look, the logic now covers DOI lookup, CrossRef-based DOI discovery, and arXiv-based validation, which is a good first step 👍

A few scope / test questions after reading it:

  • the original issue description also mentions a search-based fallback via CompositeSearchBasedFetcher, is that planned for later?
  • the current implementation already introduces UNSURE, but the current tests do not seem to exercise this explicitly yet. Worth adding a small test?
  • In the second test, the assertion assertNotEquals(REAL, ...) seems quite broad. Since the implementation distinguishes between UNSURE and FAKE, it might be helpful to assert the expected classification more explicitly.
  • The current tests mainly cover the happy path and a basic negative case. Would it make sense to add a few more small cases to explicitly exercise different branches (e.g. arXiv fallback, CrossRef path)?

Also, if more realistic samples are needed later, the RefChecker test suite might be a useful source of inspiration for real-world citation patterns?

Happy to hear your thoughts on these.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

Hey @wanling0000 thanks for testing

  1. So, CompositeSearchBasedFetcher is planned as a follow-up.
    It requires ImporterPreferences to know which catalogs the user has
    selected, which belongs in the GUI/CLI layer rather than the logic layer.

  2. UNSURE test I have added one but, a reliable UNSURE test is hard to write against a live API because I cannot control the exact score. The entryWithSlightlyWrongTitleIsClassifiedAsUnsureOrReal() asserts notFAKE.
    There are two ways RefChecker can return UNSURE:
    a. Network Error: The fetcher throws an exception (not 404).
    b. Metadata Mismatch: The API finds the paper but the similarity score falls between 0.5 and 0.8.

  3. assertNotEquals, the reason I used assertNotEquals(REAL) is that the
    result could be either UNSURE or FAKE depending on the similarity score,
    which is determined by the string similarity algorithm at runtime. I cannot
    predict which one it will be without hardcoding implementation details.

    But I can assert it FAKE specifically the title and author are so
    wrong that the score falls below 0.5. If you agree with that implementation then I'll proceed.

  4. I have added

Test for arXiv fallback: If a paper cannot be validated via its DOI or CrossRef the checker attempts to resolve it using its arXiv identifiers
Test for CrossRef path: an entry with no DOI and no arXiv ID but enough metadata for CrossRef to find it.

  1. Also I'm going through the RefChecker test suite I'll update if I find any inspiration for testing

@wanling0000
Copy link
Copy Markdown
Collaborator

Hi @NishantDG-SST thanks for the detailed explanation and for adding the additional tests, this helps a lot 👍

  1. So, CompositeSearchBasedFetcher is planned as a follow-up.
    It requires ImporterPreferences to know which catalogs the user has
    selected, which belongs in the GUI/CLI layer rather than the logic layer.

The fallback plan makes sense to me.

  1. UNSURE test I have added one but, a reliable UNSURE test is hard to write against a live API because I cannot control the exact score. The entryWithSlightlyWrongTitleIsClassifiedAsUnsureOrReal() asserts notFAKE.
    There are two ways RefChecker can return UNSURE:
    a. Network Error: The fetcher throws an exception (not 404).
    b. Metadata Mismatch: The API finds the paper but the similarity score falls between 0.5 and 0.8.

From reading RefChecker, my understanding is that UNSURE only occurs when an authoritative entry was found, while cases like “no result” or fetch failures are treated as FAKE.

Just wanted to confirm if this matches the intended behavior, mainly so I can align on testing.

Maybe it would help to document this a bit more explicitly (e.g. in docs or a small test matrix), so the expected classification is clearer.

  1. assertNotEquals, the reason I used assertNotEquals(REAL) is that the
    result could be either UNSURE or FAKE depending on the similarity score,
    which is determined by the string similarity algorithm at runtime. I cannot
    predict which one it will be without hardcoding implementation details.
    But I can assert it FAKE specifically the title and author are so
    wrong that the score falls below 0.5. If you agree with that implementation then I'll proceed.

Happy for you to continue with your current approach (no need to block on this), I’ll focus on validation/testing on my side :)

@github-actions github-actions Bot added status: changes-required Pull requests that are not yet complete status: no-bot-comments and removed status: no-bot-comments status: changes-required Pull requests that are not yet complete labels Apr 4, 2026
@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

From reading RefChecker, my understanding is that UNSURE only occurs when an authoritative entry was found, while cases like “no result” or fetch failures are treated as FAKE.

Just wanted to confirm if this matches the intended behavior, mainly so I can align on testing.

Yes you are right that is correct and I apologize for the confusion.
Network errors will always return FAKE.
UNSURE is returned in exactly one case: an authoritative entry was found
and the similarity score falls in [0.5, 0.8]. Network errors and not-found
cases always return FAKE with null matchedEntry and score 0.0.

I have documented this explicitly in the classify() JavaDoc and added
a note to check() clarifying that fetch failures always produce FAKE.
The matchedEntry field distinguishes the two FAKE cases null means
nothing was found, non-null means a candidate was found but similarity < 0.5

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

NishantDG-SST commented Apr 4, 2026

So, I added 3 new test cases
entryWithCompletelyWrongAuthorIsNotClassifiedAsReal() (checks real title and real DOI with completely wrong authors)
entryWhoseDOIResolvesToDifferentPaperIsNotClassifiedAsReal() (this does the same as entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() , So if you feel like one of them is not needed i can remove one)
bertPaperWithCorrectDoiIsClassifiedAsReal() (a second real paper test using a different arXiv DOI)

@wanling0000
Copy link
Copy Markdown
Collaborator

Thanks for adding these, this looks good to me

entryWhoseDOIResolvesToDifferentPaperIsNotClassifiedAsReal() (this does the same as entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() , So if you feel like one of them is not needed i can remove one)

I’m fine with either keeping both for clarity or removing one if you prefer to avoid duplication.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

My next steps are to

  • Add method to RefChecker to process a whole .bib file
  • Create the refcheck group tree (real paper / unsure / fake paper) using ExplicitGroup and GroupTreeNode like addSuggestedGroups in GroupTreeViewModel.java

Happy to hear your thoughts and will proceed accordingly.

@pluto-han
Copy link
Copy Markdown
Collaborator

pluto-han commented Apr 5, 2026

@NishantDG-SST Thank you for adding a new duplicate checker! The old isDuplicate method does not work well in #15316, compareEntries looks greate!

@wanling0000
Copy link
Copy Markdown
Collaborator

wanling0000 commented Apr 5, 2026

Sorry, I don't want my delayed response to slow down your development, so if you think you have a reasonably solid solution, just go ahead and implement it.

My next steps are to

  • Add method to RefChecker to process a whole .bib file

Regarding your question, here’s what I think:

It might be helpful to keep the “per-entry classification” logic and the “input orchestration / batch processing” a bit separate, so their behaviors can be tested more independently.

So maybe, instead of adding this directly into RefChecker, it could stay focused as a “single BibEntry checker”, and a small batch/use-case layer could handle processing a whole .bib file?

(This is just a thought from my side, happy to follow whatever direction makes more sense here.)

  • Create the refcheck group tree (real paper / unsure / fake paper) using ExplicitGroup and GroupTreeNode like addSuggestedGroups in GroupTreeViewModel.java

Similarly, it might be cleaner if the grouping / assignment is handled by a separate assembler/applicator layer.

But, if the PR gets a bit large, splitting into a follow-up PR could also make review easier.


From my side, I’ll mainly focus on validating the behavior and testing as things evolve.

So I was thinking: based on the algorithm described by @koppor , maybe we can first align on the expected behavior (via tests), and then implementation can follow more freely.

As a rough draft from a testing perspective, I tried to map the current logic into a small test matrix:

Layer 1: Per-entry classification

DOI path

Case Description Priority Coverage
A1 DOI match -> REAL Must yes
A2 DOI strong mismatch -> FAKE Must partial (weak assertion)
A3 DOI partial mismatch -> UNSURE Should partial (non-deterministic)
A4 DOI not found Should no
A5 DOI exception Should no

CrossRef path

Case Description Priority Coverage
B1 CrossRef -> DOI -> REAL Must yes
B2 CrossRef mismatch -> FAKE Should no
B3 CrossRef partial mismatch -> UNSURE Should no
B4 CrossRef not found Should no
B5 CrossRef exception Should no

arXiv path

Case Description Priority Coverage
C1 arXiv match -> REAL Must yes
C2 arXiv mismatch -> FAKE Should no
C3 arXiv partial mismatch -> UNSURE Should no
C4 no arXiv id Should no
C5 arXiv exception Should no

Selection logic (bestOf)

Case Description Priority Coverage
D1 prefer UNSURE over FAKE when no REAL (or should this be defined differently?) Should no
D2 among same-level non-REAL results, higher score wins Should no
D3 all FAKE fallback Should no

Classification semantics

Case Description Priority Coverage
E1 REAL threshold Should no
E2 UNSURE range Should no
E3 FAKE threshold Should no
E4 "not found" vs "found but mismatch" Should no

One small observation: since current tests rely on live API responses, some similarity-based cases (like UNSURE) might be hard to test deterministically.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

@wanling0000 Yes definitely I'll keep the layers separate.

So for the tests, will using Mockito to mock the fetchers in a new
RefCheckerUnitTest class separate from the existing @FetcherTest class be better?
This might help cover the deterministic cases.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

Covered these Tests in UnitTest till now

DOI path

Case Description Unit Test Reference
A2 DOI strong mismatch -> FAKE doiStrongMismatchReturnsFake()
A4 DOI not found doiNotFoundReturnsFakeWithNullMatch()
A5 DOI exception doiExceptionReturnsFakeAndTriesNextSource()

CrossRef path

Case Description Unit Test Reference
B4 CrossRef not found crossRefNotFoundReturnsFake()
B5 CrossRef exception crossRefExceptionReturnsFake()

arXiv path

Case Description Unit Test Reference
C5 arXiv exception allSourcesFailReturnsFakeWithNullMatch()

Selection logic (bestOf)

Case Description Unit Test Reference
D3 all FAKE fallback allSourcesFailReturnsFakeWithNullMatch()

Classification semantics

Case Description Unit Test Reference
E4 "not found" vs "found but mismatch" notFoundVsFoundButMismatchDistinguishedByOtherEntry()

I'll cover the rest after this

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

More Coverage

Case Description Unit Test Reference
A3 DOI partial mismatch -> UNSURE doiPartialMismatchReturnsUnsure()
B2 CrossRef mismatch -> FAKE crossRefMismatchReturnsFakeWithOtherEntry()
C2 arXiv mismatch -> FAKE arXivMismatchReturnsFake()
C4 no arXiv id entryWithNoArXivIdReturnsFakeViaArXivSelection()
D1 prefer UNSURE over FAKE bestOfPrefersUnsureOverFake()

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

In D2 same-level -> higher score wins
I tried testing this but, here whats happening is RefChecker logic is picking the first REAL result it finds and stopping there, rather than checking if a following fetcher found even better match or not.
So should I change the RefChecker logic?
@wanling0000

@subhramit
Copy link
Copy Markdown
Member

we can create a map of weights for each field like :

Map<Field, Double> FIELD_WEIGHTS = Map.of(
    StandardField.DOI, 2.0,
    StandardField.TITLE, 1.5,
    StandardField.AUTHOR, 1.5,
    StandardField.YEAR, 1.0
);

That is a good idea

@wanling0000
Copy link
Copy Markdown
Collaborator

I tried the LinkedIn case Oliver shared locally, and it is classified as REAL.

@pluto-han for a change of taste, you can also take a look here as to what's going on, and share any thoughts!

Just wanted to check: is this PR blocking any of your work?

If so maybe some parts could be split out into a smaller PR and merged earlier

@subhramit
Copy link
Copy Markdown
Member

Just wanted to check: is this PR blocking any of your work?

If so maybe some parts could be split out into a smaller PR and merged earlier

No no, all fine. Was just thinking of getting some more help here since it's a large PR and you're the only one reviewing it.

@pluto-han was also waiting for the new duplicate checker, but I don't think it is a blocker since his PR is now merged.

@pluto-han
Copy link
Copy Markdown
Collaborator

@pluto-han for a change of taste, you can also take a look here as to what's going on, and share any thoughts!

Yep I will help review this PR.

PS: I made some little change to the existing duplicate checker and it now works well now. The new duplicate checker here looks also very good.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

So I have done some testing and kept the field weights


private static final Map<Field, Double> COMPARE_ENTRIES_FIELD_WEIGHTS = Map.of(
            StandardField.TITLE, 2.0,
            StandardField.AUTHOR, 2.0,
            StandardField.YEAR, 0.5
    );

but this triggers other tests like doi and usure tests are failing (i'll fix them), but the main thing is that its still failing the second author mismatch test so either i have to increase the author weight even more or we can add something like this inside classify by dropping the score for author hallucinations


if (field.equals(StandardField.AUTHOR) && similarity < 0.8) {
        similarity -= 0.2;
    }

@pluto-han
Copy link
Copy Markdown
Collaborator

either i have to increase the author weight even more or we can add something like this inside classify by dropping the score for author hallucinations

Maybe you can try the existing weights in duplicate checker?

    static {
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.AUTHOR, 2.5);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.EDITOR, 2.5);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.TITLE, 3.);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.JOURNAL, 2.);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.NOTE, 0.1);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.COMMENT, 0.1);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.DOI, 3.);
    }

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

@pluto-han ok I'll try em out thank you

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

Ok so to implement this functionality :

With "UNSURE", I meant things JabRef cannot check. Example:

@misc{jabref-issue-13604,
title = {Implement RefChecker in JabKit},
author = {Oliver Kopp},
date = {2025-07-29},
url = {https://github.com/JabRef/jabref/issues/13604}
}
Current JabRef has no ways to check whether this is right or wrong.

We can keep a map of StandardEntryTypes from StandardEntryType.java and if
if entry.getType() is not present in the map of the StandardEntryTypes we return UNSURE.

Just wanted to recheck again if any one of the fields is non-fetchable (even if like 5 other fields are fetchable, assuming total 6 fields provided) it should return UNSURE right?

@pluto-han
Copy link
Copy Markdown
Collaborator

Just wanted to recheck again if any one of the fields is non-fetchable (even if like 5 other fields are fetchable, assuming total 6 fields provided) it should return UNSURE right?

I think what @koppor meant in #13604 (comment) is:

If the reference is not checkable, for example @misc, references created by url fetcher etc , JabRef returns "unsure".

But i am not sure for "REAL-LOW-QUALITY", whether JabRef should return "unsure" or "real". Because in the existing duplicate checker, if two references have the same DOI, JabRef thinks they are duplicate even if other non-core fields are different; but this PR seems to implement different logic.

private static boolean haveSameIdentifier(final BibEntry one, final BibEntry two) {
return one.getFields().stream()
.filter(field -> field.getProperties().contains(FieldProperty.IDENTIFIER))
.anyMatch(field -> two.getField(field)
.map(content -> {
String oneValue = one.getField(field).orElseThrow();
if (field == StandardField.DOI) {
return oneValue.equalsIgnoreCase(content);
}
return oneValue.equals(content);
})
.orElse(false));
}

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

Description Segregated as
doi is present but core fields like author title is wrong Should be FAKE
doi is present but non core fields are wrong (no core fields provided) FAKE or UNSURE ?
doi is present but wrong non-core fields, core fields present and matching/close REAL or maybe UNSURE

@wanling0000
Copy link
Copy Markdown
Collaborator

I think we might be mixing two different questions here:

  1. Is the reference checkable at all by JabRef?
  2. If it is checkable, how should we classify metadata quality?

From this comment, my current understanding is:

  • REAL = checkable and matching
  • FAKE = checkable and contradicting
  • UNSURE = not checkable / not verifiable by current JabRef infrastructure

Because of that, I’m wondering whether we should keep the first step smaller.

If we are still not aligned on the exact meaning of UNSURE (or on what to do with "REAL-LOW-QUALITY" cases), it would be okay to keep the enum for now, but not fully use it in the first version of the logic yet.

From what I see now, there might be two possible ways to move forward:

(1) Keep the scope minimal and just wire up the existing infrastructure so it runs end-to-end (e.g. use isDuplicate to classify checkable cases into REAL / FAKE).

or

(2) Start from that LinkedIn test case, define the semantics more precisely, and possibly adjust the assumptions in DuplicateCheck.

Personally, I lean towards (1) as a first step, and then treat (2) as a follow-up anchor case to refine the behavior.

If you have already try to adapt DuplicateCheck in this PR, I’m also fine with that. But it might help to first make the intended rules a bit more explicit (e.g., which fields are considered more important), since the examples in this draft suggest that some fields should take priority over others.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

@wanling0000 I agree we should keep scope small but I think using isDuplicate would actually decrease the quality here as it bypasses metadata validation as soon as an identifier match is found.

This means a hallucinated reference with a real DOI would be classified REAL.

I'd prefer to keep compareEntries as the classifier since it's already written and tested and just keep the UNSURE refinement ongoing.

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

Currently I'm working on comparing author fields positionally by individual author rather than as raw strings also using Levenshtein similarity on full names so hallucinated first names are detected even when the family name matches exactly
But I have a blockage currently that is this test case from the LinkedIn post:

void detectsHallucinatedSecondNameMismatch() throws FetcherException {
        BibEntry authoritativeEntry = new BibEntry(StandardEntryType.Article)
                .withField(StandardField.TITLE, "Interdisciplinary research: Friend or foe to ethical AI?")
                .withField(StandardField.AUTHOR, "Mussgnug, Alexander Martin")
                .withField(StandardField.YEAR, "2026");

        when(doiFetcher.performSearchById("10.1017/cfc.2026.10015"))
                .thenReturn(Optional.of(authoritativeEntry));

        BibEntry entryWithHallucinatedName = new BibEntry(StandardEntryType.Article)
                .withField(StandardField.TITLE, "Interdisciplinary research: Friend or foe to ethical AI?")
                .withField(StandardField.AUTHOR, "Mussgnug, Anna Maria")
                .withField(StandardField.YEAR, "2026")
                .withField(StandardField.DOI, "10.1017/cfc.2026.10015");

        RefCheckResult result = refChecker.check(entryWithHallucinatedName);

        assertEquals(RefValidity.UNSURE, result.validity());
}

Just wanted to be sure again if this test case should return UNSURE or REAL as the TITLE and YEAR match exactly keeping the score high in current implementation.

@subhramit
Copy link
Copy Markdown
Member

Please no force pushes.
Merge main without rebasing.
The commit history is not clean currently.

@subhramit
Copy link
Copy Markdown
Member

@NishantDG-SST can you please clean up the git tree?

@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

@subhramit should I create a new branch with a single squashed commit and open a new pr?

@subhramit
Copy link
Copy Markdown
Member

@subhramit should I create a new branch with a single squashed commit and open a new pr?

you can reset with upstream/main and push a single fresh commit with these changes at the end.
Other way - when merging, we squash anyway but only issue would be with the commit authorship, so we will have to manually squash-merge and remove co-authors instead of using the merge queue. I guess the second way would be least effort overall. Just make sure that the source diff is clean.

Please be mindful of rebasing in future. Read about the perils of rebasing as well as force-pushes.

@NishantDG-SST NishantDG-SST changed the title Add RefChecker logic for reference validation Relates to #13604 Add RefChecker logic for reference validation Apr 21, 2026
@NishantDG-SST
Copy link
Copy Markdown
Contributor Author

Thanks @subhramit I'll try to be more mindful of rebasing from next time, sorry for the trouble

Copy link
Copy Markdown
Contributor

@LoayTarek5 LoayTarek5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NishantDG-SST , clean structure overall, good work

I have a few concerns from my understanding that i will address

///
/// FAKE with a non-null matchedEntry means a candidate was found but did not match.
/// FAKE with null matchedEntry means nothing was found at all.
private static RefCheckResult classify(BibEntry local, BibEntry authoritative) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current mapping:
score >= 0.8 → REAL
score >= 0.5 → UNSURE
score < 0.5 → FAKE

but @koppor clarified in #13604 that UNSURE means "JabRef has no way to verify this at all",
not "partial match found.",
his example was a @misc entry pointing to a GitHub URL that no fetcher can check

This was my understanding, is it correct?

and i think this would also resolve the ongoing confusion about the LinkedIn/Mussgnug case,since "found but author doesn't match well" would be FAKE, not UNSURE

Comment on lines +113 to +115
if (doi.isEmpty()) {
return new RefCheckResult(RefValidity.FAKE, null, 0.0);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if (doi.isEmpty()) → FAKE is wrong, no DOI not equal fake paper, it means this source can't help, same for CrossRef/arXiv "not found" cases.

it will result, if all three sources have nothing to check, bestOf returns FAKE.
as i understand from Dr.Oliver(koppor), this should be UNSURE.

my suggestion is to use Optional for check methods, return UNSURE when all sources returned empty.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if (doi.isEmpty()) → FAKE is wrong, no DOI not equal fake paper, it means this source can't help, same for CrossRef/arXiv "not found" cases.

Agreed.

as i understand from Dr.Oliver(koppor), this should be UNSURE.

Im not sure if this should be UNSURE, what if that entry only misses DOI and all other fields are the same? Then maybe this should return TRUE. I think what koppor meant is for online urls not for literatures?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking again into the comments i think you are right @pluto-han, thanks.
yeah i think UNSURE is for non-verifiable types (@misc, @online) already handled by VERIFIABLE_TYPES, a verifiable @article with no results -> FAKE is correct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hallucinated-author case ("Mussgnug, Anna Maria" vs "Alexander Martin") was the motivation for adding weights, but has no locked-in unit test.
without it, threshold changes can silently regress this

/// @param one the local entry to check (drives which fields are scored)
/// @param two the authoritative entry fetched from an online source
/// @return weighted similarity score in [0.0, 1.0]
public static double compareEntries(BibEntry one, BibEntry two) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareEntries only iterates local fields, entry with just {TITLE} matching an authoritative entry -> score 1.0 -> REAL, regardless of mismatching author/year in the authoritative entry.

i think this means a hallucinated reference with just {title, year} faking a real DOI could land as REAL even if the authoritative entry has author/journal info that contradicts nothing (because those fields aren't compared)(right?)

try to consider requiring a minimum number of comparable fields or penalizing missing core fields

double similarity;
if (field.getProperties().contains(FieldProperty.PERSON_NAMES)) {
List<Author> localAuthors = AuthorList.parse(firstValue).getAuthors().stream()
.filter(a -> !a.getFamilyGiven(false).equalsIgnoreCase("others"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFamilyGiven(false).equalsIgnoreCase("others"), Author.OTHERS sentinel may not stringify to "others" with this method, please verify and add a test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: traced through source, Author.OTHERS.getFamilyGiven(false) returns "others" exactly, the filter works.
a small unit test to lock this in would still be nice though.

Comment on lines +418 to +421
double givenSimilarity = (localGiven.isEmpty() || authGiven.isEmpty())
? 1.0
: stringSimilarity.similarity(localGiven, authGiven);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If local has "Smith" (no given) and authoritative has "Smith, John", given similarity = 1.0, this is reasonable for abbreviation, but the two branches (local-empty vs authoritative-empty vs both-empty) have different semantic meanings, i think it worth adding explicit tests for each branch so future tweaks don't shift behavior silently

@github-actions github-actions Bot added status: changes-required Pull requests that are not yet complete and removed status: no-bot-comments labels Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

good third issue status: changes-required Pull requests that are not yet complete

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants