Add RefChecker logic for reference validation by NishantDG-SST · Pull Request #15478 · JabRef/jabref

NishantDG-SST · 2026-04-01T22:22:37Z

Related issues and pull requests

PR Description

Only logic for the RefCheck functionality.

Steps to test

Run

./gradlew :jablib:fetcherTest --tests "org.jabref.logic.refcheck.RefCheckerTest"

Checklist

I own the copyright of the code submitted and I license it under the MIT license
I manually tested my changes in running JabRef (always required)
I added JUnit tests for changes (if applicable)
[/] I added screenshots in the PR description (if change is visible to the user)
[/] I added a screenshot in the PR description showing a library with a single entry with me as author and as title the issue number
I described the change in CHANGELOG.md in a way that can be understood by the average user (if change is visible to the user)
[/] I checked the user documentation for up to dateness and submitted a pull request to our user documentation repository

qodo-free-for-open-source-projects · 2026-04-01T22:23:02Z

Review Summary by Qodo

Add RefChecker logic for reference validation

✨ Enhancement

Walkthroughs

Description

• Add RefChecker logic for validating bibliographic entries against online sources
• Implement entry comparison using DOI, CrossRef, and arXiv fetchers
• Add compareEntries method to DuplicateCheck for similarity scoring
• Introduce RefValidity enum with REAL, UNSURE, and FAKE classifications

Diagram

flowchart LR
  Entry["BibEntry to validate"]
  DOI["DOI Lookup"]
  CrossRef["CrossRef Discovery"]
  ArXiv["ArXiv Lookup"]
  Compare["compareEntries Similarity"]
  Result["RefCheckResult with validity"]
  
  Entry --> DOI
  Entry --> CrossRef
  Entry --> ArXiv
  DOI --> Compare
  CrossRef --> Compare
  ArXiv --> Compare
  Compare --> Result

File Changes

1. jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java ✨ Enhancement +43/-0

Add entry comparison method with similarity scoring

• Add COMPARE_ENTRIES_THRESHOLD constant (0.8) for similarity scoring
• Implement compareEntries static method to compute similarity between entries
• Filter out internal fields from comparison to avoid citation key bias
• Return normalized similarity score in range [0.0, 1.0]

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java

2. jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java ✨ Enhancement +225/-0

Implement reference validation against online sources

• Create RefChecker class for validating entries against online sources
• Implement three-stage validation: DOI lookup, CrossRef discovery, arXiv lookup
• Add check method that returns best result from all sources
• Implement classification logic using similarity thresholds
• Add error handling with logging for fetcher failures

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java

3. jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java ✨ Enhancement +7/-0

Define reference validity classification enum
• Create RefValidity enum with three states: REAL, UNSURE, FAKE
• Represents classification of bibliographic entry validity
jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java

View more (4)

4. jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java ✨ Enhancement +8/-0

Define reference check result record
• Create RefCheckResult record to encapsulate validation results
• Include validity classification, matched entry, and similarity score
jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java

5. jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java 🧪 Tests +58/-0

Add tests for entry comparison method
• Add five test cases for compareEntries method
• Test scenarios: no shared fields, identical entries, citation key ignoring, identical titles,
 different fields
• Verify similarity scoring and threshold behavior
jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java

6. jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java 🧪 Tests +71/-0

Add integration tests for RefChecker
• Create RefCheckerTest with three integration test cases
• Test real paper validation with correct DOI
• Test rejection of entries with correct DOI but wrong metadata
• Test classification of nonexistent entries as fake
jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java

7. CHANGELOG.md 📝 Documentation +1/-1

Document RefChecker feature in changelog

• Add entry documenting RefChecker logic addition
• Reference issue #13604
• Describe validation against DOI, CrossRef, and arXiv sources

CHANGELOG.md

qodo-free-for-open-source-projects · 2026-04-01T22:23:03Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (3) 📎 Requirement gaps (0)

1. LOGGER.warn drops stack traces ☑ 📘 Rule violation ≡ Correctness

Description

The new exception logging uses e.getMessage() and does not pass the Throwable to the logger,
losing stack traces and structured exception data. This makes diagnosing fetch failures
significantly harder and violates the project’s exception logging requirement.

Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R100-165]

+        } catch (FetcherException e) {
+            LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (found.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        return classify(entry, found.get());
+    }
+
+    /// Tries to validate the entry by first discovering a DOI through CrossRef,
+    /// then fetching the authoritative entry using that DOI.
+    /// Returns a FAKE result with no matched entry if CrossRef finds no DOI
+    private RefCheckResult checkByCrossRef(BibEntry entry) {
+        Optional<DOI> foundDoi;
+        try {
+            foundDoi = crossRef.findIdentifier(entry);
+        } catch (FetcherException e) {
+            LOGGER.warn("CrossRef lookup failed: {}", e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (foundDoi.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        Optional<BibEntry> found;
+        try {
+            found = doiFetcher.performSearchById(foundDoi.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("DOI fetch after CrossRef discovery failed for {}: {}", foundDoi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (found.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        return classify(entry, found.get());
+    }
+
+    /// Tries to validate the entry using arXiv.
+    /// First looks for an arXiv identifier, then fetches the full entry by that identifier.
+    /// Returns a FAKE result with no matched entry if no arXiv identifier is found
+    private RefCheckResult checkByArXiv(BibEntry entry) {
+        Optional<ArXivIdentifier> identifier;
+        try {
+            identifier = arXivFetcher.findIdentifier(entry);
+        } catch (FetcherException e) {
+            LOGGER.warn("arXiv identifier lookup failed: {}", e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        if (identifier.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }
+
+        Optional<BibEntry> found;
+        try {
+            found = arXivFetcher.performSearchById(identifier.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("arXiv fetch failed for {}: {}", identifier.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }

Evidence
PR Compliance ID 16 requires logging the exception object as the last argument; the added
LOGGER.warn(...) calls only log e.getMessage() and never pass e, so stack traces are not
recorded.
AGENTS.md
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[100-165]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefChecker` catches `FetcherException` but logs only `e.getMessage()` instead of logging the actual `Throwable`, which drops stack traces.
## Issue Context
Per logging rules, exceptions must be passed to the logger as the last argument (e.g., `LOGGER.warn("...", value, e)`).
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[100-165]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~compareEntries uses orElse("")~~ ☑ 📘 Rule violation ⚙ Maintainability

Description

DuplicateCheck.compareEntries uses Optional.orElse(""), which is explicitly discouraged and can
hide missing-value behavior. Since commonFields already filters for fields present in both
entries, the orElse defaults are unnecessary and non-idiomatic.

Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[R366-382]

+        List<Field> commonFields = one.getFields().stream()
+                                      .filter(field -> !(field instanceof InternalField))
+                                      .filter(field -> two.getField(field).isPresent())
+                                      .toList();
+
+        if (commonFields.isEmpty()) {
+            return 0.0;
+        }
+
+        return commonFields.stream()
+                           .mapToDouble(field -> {
+                               String firstValue = one.getField(field).orElse("");
+                               String secondValue = two.getField(field).orElse("");
+                               return stringSimilarity.similarity(firstValue, secondValue);
+                           })
+                           .average()
+                           .orElse(0.0);

Evidence
PR Compliance ID 10 forbids the Optional.orElse("") anti-pattern; the new code adds two
orElse("") defaults when reading fields for similarity scoring.
AGENTS.md
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[366-382]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuplicateCheck.compareEntries` uses `Optional.orElse("")`, which is disallowed and unnecessary here because `commonFields` already ensures both entries contain the field.
## Issue Context
Replace the `orElse("")` calls with direct retrieval that reflects the guaranteed presence (or restructure using `Optional` APIs) to comply with Optional handling conventions.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[366-382]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Weak threshold asserts in tests 📘 Rule violation ☼ Reliability

Description

New tests use predicate assertions (assertTrue(score >= threshold) / < threshold) instead of
asserting exact expected values, weakening regression detection. This violates the unit test
requirement to assert exact values/outputs where possible.

Code

jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[R665-687]

+    @Test
+    void entriesWithIdenticalTitles() {
+        BibEntry one = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+        BibEntry two = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score >= DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }
+
+    @Test
+    void entriesWithCompletelyDifferentFields() {
+        BibEntry one = new BibEntry()
+                .withField(StandardField.TITLE, "Performance on a Signal")
+                .withField(StandardField.AUTHOR, "Richard Atkinson");
+        BibEntry two = new BibEntry()
+                .withField(StandardField.TITLE, "Rest in Treatment")
+                .withField(StandardField.AUTHOR, "Elizabeth Ballard");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score < DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }

Evidence
PR Compliance ID 29 requires tests to assert exact expected values and avoid weak predicate checks;
the added tests compare with assertTrue(...) rather than assertEquals(...) to expected numeric
results (or another exact structured expectation).
jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The added `DuplicateCheck.compareEntries` tests use threshold-based `assertTrue` predicates, which are considered weak checks.
## Issue Context
Update the tests to assert exact expected values (or exact expected structures) to strengthen regression detection.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (4)

4. assertNotEquals weakens RefChecker test 📘 Rule violation ☼ Reliability

Description

The test only asserts the result is "not REAL" via assertNotEquals, which is a weak predicate and
can pass for multiple unintended outcomes. The test should assert the exact expected RefValidity
(or a complete expected result shape) to meet unit test strength requirements.

Code

jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[R47-58]

+    @Test
+    void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
+        BibEntry entry = new BibEntry(StandardEntryType.Article)
+                .withField(StandardField.TITLE, "Not a Real Paper")
+                .withField(StandardField.AUTHOR, "Random Author")
+                .withField(StandardField.YEAR, "2099")
+                .withField(StandardField.DOI, "10.48550/arXiv.1706.03762");
+
+        RefCheckResult result = refChecker.check(entry);
+
+        assertNotEquals(RefValidity.REAL, result.validity());
+    }

Evidence
PR Compliance ID 29 requires exact expected-value assertions and discourages weak predicates;
assertNotEquals(RefValidity.REAL, ...) does not assert the intended classification outcome.
jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefCheckerTest.entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal` uses `assertNotEquals(REAL, ...)`, which is a weak predicate check.
## Issue Context
Change the assertion to an exact expected validity (or assert the full expected `RefCheckResult` properties) so the test fails on near-miss behavior changes.
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. ~~AI/LLM comment in code~~ ☑ 📘 Rule violation ⚙ Maintainability

Description

A new source-code comment references LLM usage (No LLM is used.), which is meta-process/AI
disclosure content that should not be embedded in code. This violates the rule prohibiting
AI-disclosure comments inside source files.

Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R30-31]
+/// No LLM is used. Classification is based entirely on
+/// [org.jabref.logic.database.DuplicateCheck#compareEntries].

Evidence
PR Compliance ID 26 prohibits AI/LLM disclosure comments inside source code; the added class
Javadoc-style comment explicitly mentions LLM usage.
AGENTS.md
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[30-31]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The `RefChecker` class comment includes AI/LLM disclosure text (`No LLM is used.`), which is not allowed in source code comments.
## Issue Context
Remove or rephrase the comment to describe behavior without referencing AI/LLM usage.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[30-31]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

6. ~~Unnormalized field similarity~~ ☑ 🐞 Bug ≡ Correctness

Description

DuplicateCheck.compareEntries compares raw field values via StringSimilarity.similarity, bypassing
JabRef’s existing LaTeX-free and field-specific normalization (e.g., author/page handling), which
can lower similarity for semantically identical entries and misclassify RefChecker results.

Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[R363-382]

+    public static double compareEntries(BibEntry one, BibEntry two) {
+        StringSimilarity stringSimilarity = new StringSimilarity();
+
+        List<Field> commonFields = one.getFields().stream()
+                                      .filter(field -> !(field instanceof InternalField))
+                                      .filter(field -> two.getField(field).isPresent())
+                                      .toList();
+
+        if (commonFields.isEmpty()) {
+            return 0.0;
+        }
+
+        return commonFields.stream()
+                           .mapToDouble(field -> {
+                               String firstValue = one.getField(field).orElse("");
+                               String secondValue = two.getField(field).orElse("");
+                               return stringSimilarity.similarity(firstValue, secondValue);
+                           })
+                           .average()
+                           .orElse(0.0);

Evidence
The new compareEntries uses BibEntry.getField (raw stored strings) and Levenshtein-based similarity
on those raw values. Existing duplicate logic intentionally uses getFieldLatexFree plus special
handling for PERSON_NAMES/PAGES/JOURNAL/CHAPTER, so the new scoring will diverge and can under-score
entries that differ only in formatting/LaTeX/author formatting.
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[345-383]
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[163-210]
jablib/src/main/java/org/jabref/logic/util/strings/StringSimilarity.java[30-53]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DuplicateCheck.compareEntries` currently computes similarity using raw field strings (`getField`) and generic Levenshtein similarity. This bypasses JabRef’s existing LaTeX-free extraction and field-specific comparison rules (authors/pages/journal, etc.), leading to systematically wrong similarity scores and therefore wrong `RefChecker` classifications.
### Issue Context
Existing duplicate detection already contains proven normalization and comparison logic in `compareSingleField` (LaTeX-free, PERSON_NAMES handling, page normalization, etc.). `compareEntries` should reuse these normalization steps (at least) so that formatting differences don’t dominate the score.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[345-383]
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[163-210]
### Implementation notes
- Use `getFieldLatexFree` instead of `getField` (or normalize via the same helper methods used in `compareSingleField`).
- Consider applying field-specific preprocessing: PERSON_NAMES -> correlateByWords-like handling; pages/journal normalization similar to existing methods.
- Keep internal fields excluded as today.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

7. ~~Network errors labeled FAKE~~ ☑ 🐞 Bug ☼ Reliability

Description

RefChecker catches FetcherException (network/server/parse failures) and returns RefValidity.FAKE,
incorrectly classifying “could not validate due to error” as “reference is fake”.

Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R98-103]

+        try {
+            found = doiFetcher.performSearchById(doi.get().asString());
+        } catch (FetcherException e) {
+            LOGGER.warn("DOI lookup failed for {}: {}", doi.get().asString(), e.getMessage());
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }

Evidence
FetcherException is thrown by fetcher infrastructure specifically for I/O/network and parser
failures. RefChecker converts these exceptions into FAKE results with score 0.0, which is
semantically indistinguishable from a real negative validation outcome and can produce false
negatives when services are temporarily unavailable.
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[97-103]
jablib/src/main/java/org/jabref/logic/importer/EntryBasedParserFetcher.java[31-55]
jablib/src/main/java/org/jabref/logic/importer/IdParserFetcher.java[43-72]
jablib/src/main/java/org/jabref/logic/importer/FetcherException.java[16-91]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`RefChecker` currently maps any `FetcherException` (network/server/parse errors) to `RefValidity.FAKE`. This produces false negatives: entries can be marked FAKE due to transient outages or rate limiting.
### Issue Context
Fetcher infrastructure throws `FetcherException` for network and parsing failures. These are not evidence the reference is fake; they indicate the check is inconclusive.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[89-172]
- jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java[1-7]
### Implementation notes
- Introduce an explicit state (e.g., `UNKNOWN`/`ERROR`) in `RefValidity`, or return `UNSURE` with a dedicated flag/error payload.
- Ensure `bestOf` ranking accounts for the new state.
- Update/add tests to cover error paths (e.g., mock fetchers throwing `FetcherException`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

8. Trivial threshold inline comment 📘 Rule violation ⚙ Maintainability

Description

The new inline comment on COMPARE_ENTRIES_THRESHOLD restates what the constant name already
expresses and does not explain additional rationale. This adds noise and violates the requirement
that comments explain "why" rather than "what".

Code

jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38]
+    public static final double COMPARE_ENTRIES_THRESHOLD = 0.8; // The threshold that determines if entries are likely to be of the same publication

Evidence
PR Compliance ID 7 disallows trivial comments that merely restate code; the added inline comment
repeats the meaning of COMPARE_ENTRIES_THRESHOLD without providing extra intent/rationale.
AGENTS.md
jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38-38]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The inline comment on `COMPARE_ENTRIES_THRESHOLD` is trivial and restates the identifier.
## Issue Context
Either remove the comment or replace it with a short rationale explaining *why* the specific value (`0.8`) was chosen.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java[38-38]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

9. ~~Changelog entry too technical~~ ☑ 📘 Rule violation ⚙ Maintainability

Description

The new CHANGELOG entry is programmer-focused (RefChecker logic) and lacks the end-user
framing/style expected for release notes. This reduces clarity for average users and may not meet
changelog quality requirements.

Code

CHANGELOG.md[13]

+- We added RefChecker logic to validate entries against DOI, CrossRef, and arXiv sources [#13604](https://github.com/JabRef/jabref/issues/13604)

Evidence

PR Compliance ID 25 requires end-user understandable changelog entries, and PR Compliance ID 32
requires professional, precise wording; the added entry uses internal implementation naming rather
than user-facing wording.

AGENTS.md
CHANGELOG.md[13-13]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new changelog bullet references internal implementation details ("RefChecker logic") instead of describing the user-visible feature.
## Issue Context
Rewrite the entry to describe what users can do/what improved, using consistent changelog style.
## Fix Focus Areas
- CHANGELOG.md[13-13]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

10. Tie loses matched entry 🐞 Bug ≡ Correctness

Description

RefChecker.bestOf breaks ties only by validity rank and similarityScore, so when those are equal it
can keep a result with otherEntry=null even if another attempt found an authoritative entry,
dropping useful diagnostics.

Code

jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[R195-209]

+    private static RefCheckResult bestOf(RefCheckResult doiResult, RefCheckResult crossRefResult,
+                                         RefCheckResult arXivResult) {
+        RefCheckResult best = doiResult;
+
+        if (rank(crossRefResult) > rank(best)
+                || (rank(crossRefResult) == rank(best)
+                && crossRefResult.similarityScore() > best.similarityScore())) {
+            best = crossRefResult;
+        }
+
+        if (rank(arXivResult) > rank(best)
+                || (rank(arXivResult) == rank(best)
+                && arXivResult.similarityScore() > best.similarityScore())) {
+            best = arXivResult;
+        }

Evidence
bestOf only replaces the current best if rank is greater or similarityScore is strictly greater.
Since some paths return FAKE with otherEntry=null (e.g., missing DOI/identifier) and others can
return FAKE/UNSURE with otherEntry present, an equal-rank/equal-score tie will retain the earlier
result and may discard a found candidate entry.
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[189-212]
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[89-107]
jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[174-187]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`bestOf` ignores whether `otherEntry` is present when breaking ties. This can return a result with `otherEntry == null` even though another attempt found a candidate authoritative entry.
### Issue Context
The UI/consumer likely benefits from seeing the closest authoritative entry even when validity is FAKE/UNSURE. The tie-breaker should preserve richer results.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java[189-212]
### Implementation notes
- When `rank(a) == rank(b)` and `similarityScore` is equal, prefer the result with non-null `otherEntry`.
- Optionally compute `rank(...)` once per result to simplify logic.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-04-01T22:27:03Z

+    @Test
+    void entriesWithIdenticalTitles() {
+        BibEntry one = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+        BibEntry two = new BibEntry().withField(StandardField.TITLE, "Reinforcement learning: An introduction");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score >= DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }
+
+    @Test
+    void entriesWithCompletelyDifferentFields() {
+        BibEntry one = new BibEntry()
+                .withField(StandardField.TITLE, "Performance on a Signal")
+                .withField(StandardField.AUTHOR, "Richard Atkinson");
+        BibEntry two = new BibEntry()
+                .withField(StandardField.TITLE, "Rest in Treatment")
+                .withField(StandardField.AUTHOR, "Elizabeth Ballard");
+
+        double score = DuplicateCheck.compareEntries(one, two);
+
+        assertTrue(score < DuplicateCheck.COMPARE_ENTRIES_THRESHOLD);
+    }


3. Weak threshold asserts in tests 📘 Rule violation ☼ Reliability

New tests use predicate assertions (assertTrue(score >= threshold) / < threshold) instead of asserting exact expected values, weakening regression detection. This violates the unit test requirement to assert exact values/outputs where possible.

Agent Prompt

## Issue description The added `DuplicateCheck.compareEntries` tests use threshold-based `assertTrue` predicates, which are considered weak checks. ## Issue Context Update the tests to assert exact expected values (or exact expected structures) to strengthen regression detection. ## Fix Focus Areas - jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java[665-687]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-04-01T22:27:03Z

+    @Test
+    void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
+        BibEntry entry = new BibEntry(StandardEntryType.Article)
+                .withField(StandardField.TITLE, "Not a Real Paper")
+                .withField(StandardField.AUTHOR, "Random Author")
+                .withField(StandardField.YEAR, "2099")
+                .withField(StandardField.DOI, "10.48550/arXiv.1706.03762");
+
+        RefCheckResult result = refChecker.check(entry);
+
+        assertNotEquals(RefValidity.REAL, result.validity());
+    }


4. assertnotequals weakens refchecker test 📘 Rule violation ☼ Reliability

The test only asserts the result is "not REAL" via assertNotEquals, which is a weak predicate and can pass for multiple unintended outcomes. The test should assert the exact expected RefValidity (or a complete expected result shape) to meet unit test strength requirements.

Agent Prompt

## Issue description `RefCheckerTest.entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal` uses `assertNotEquals(REAL, ...)`, which is a weak predicate check. ## Issue Context Change the assertion to an exact expected validity (or assert the full expected `RefCheckResult` properties) so the test fails on near-miss behavior changes. ## Fix Focus Areas - jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java[47-58]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copilot

Pull request overview

This PR introduces initial “RefChecker” logic in jablib to validate bibliographic entries by resolving them via DOI/CrossRef/arXiv and classifying them based on similarity to fetched authoritative metadata.

Changes:

Added new refcheck domain types (RefChecker, RefCheckResult, RefValidity) and online-validation flow.
Added DuplicateCheck.compareEntries(...) plus a shared threshold constant to support similarity-based validation.
Added initial integration-style tests for RefChecker and extended DuplicateCheckTest; updated CHANGELOG.md.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`jablib/src/main/java/org/jabref/logic/refcheck/RefChecker.java`	Implements the online lookup + classification flow (DOI → CrossRef → arXiv) and picks the best result.
`jablib/src/main/java/org/jabref/logic/refcheck/RefCheckResult.java`	Adds a result record carrying validity, optional matched entry, and similarity score.
`jablib/src/main/java/org/jabref/logic/refcheck/RefValidity.java`	Defines the classification enum (`REAL/UNSURE/FAKE`).
`jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java`	Adds similarity scoring (`compareEntries`) and a threshold constant used by refcheck.
`jablib/src/test/java/org/jabref/logic/refcheck/RefCheckerTest.java`	Adds initial fetcher-backed tests covering “real”, “not real”, and “nonexistent” cases.
`jablib/src/test/java/org/jabref/logic/database/DuplicateCheckTest.java`	Adds unit tests for `compareEntries` behavior (self-compare, internal field ignore, etc.).
`CHANGELOG.md`	Documents the addition of RefChecker logic.

Copilot · 2026-04-01T22:28:13Z

+    @Test
+    void realPaperWithCorrectDoiIsClassifiedAsReal() {
+        BibEntry entry = new BibEntry(StandardEntryType.Article)
+                .withField(StandardField.TITLE, "Attention Is All You Need")
+                .withField(StandardField.AUTHOR, "Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and others")
+                .withField(StandardField.YEAR, "2017")
+                .withField(StandardField.DOI, "10.48550/arXiv.1706.03762");
+
+        RefCheckResult result = refChecker.check(entry);
+
+        assertEquals(RefValidity.REAL, result.validity());
+    }
+
+    @Test
+    void entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() {
+        BibEntry entry = new BibEntry(StandardEntryType.Article)
+                .withField(StandardField.TITLE, "Not a Real Paper")
+                .withField(StandardField.AUTHOR, "Random Author")
+                .withField(StandardField.YEAR, "2099")
+                .withField(StandardField.DOI, "10.48550/arXiv.1706.03762");
+
+        RefCheckResult result = refChecker.check(entry);
+
+        assertNotEquals(RefValidity.REAL, result.validity());
+    }
+
+    @Test
+    void entryThatDoesNotExistAnywhereIsClassifiedAsFake() {
+        BibEntry entry = new BibEntry(StandardEntryType.Article)
+                .withField(StandardField.TITLE, "Nonexistent Paper with no Database")
+                .withField(StandardField.AUTHOR, "No Author")
+                .withField(StandardField.YEAR, "1800");
+
+        RefCheckResult result = refChecker.check(entry);
+
+        assertEquals(RefValidity.FAKE, result.validity());
+    }


These tests are integration-style and depend on live responses from external services (CrossRef/DOI/arXiv). That makes them prone to flakiness when metadata formatting or search results change (especially the “does not exist anywhere” case, where CrossRef could still return a fuzzy match). Prefer a deterministic unit test by injecting mocked DoiFetcher/ArXivFetcher/CrossRef via the 3-arg RefChecker constructor and asserting on controlled responses.

Copilot · 2026-04-01T22:28:13Z

+                               String firstValue = one.getField(field).orElse("");
+                               String secondValue = two.getField(field).orElse("");


compareEntries uses getField(...) (raw field content) which can penalize harmless formatting differences (LaTeX braces/escaping, whitespace, line breaks) and lead to false FAKE/UNSURE classifications. Since DuplicateCheck already normalizes via getFieldLatexFree(...) in its comparison logic, consider using latex-free/normalized values here as well to keep scoring consistent with the rest of the duplicate-checking implementation.

Suggested change

String firstValue = one.getField(field).orElse("");

String secondValue = two.getField(field).orElse("");

String firstValue = one.getFieldLatexFree(field).orElse("");

String secondValue = two.getFieldLatexFree(field).orElse("");

Copilot · 2026-04-01T22:28:13Z

+        return bestOf(doiResult, crossRefResult, arXivResult);
+    }


If none of the sources yields any candidate (otherEntry == null / score 0.0), bestOf(...) currently returns FAKE (because each lookup returns FAKE on “not found”). That conflates “not found / could not verify” with “verified mismatch” and can mislabel obscure/older but real publications as fake. Consider returning UNSURE when no authoritative candidate was found from any source, and reserving FAKE for the case where a candidate exists but similarity is low.

If the intended behavior is:

no candidate found -> FAKE

fetch failure -> FAKE

low similarity candidate -> FAKE

then I think it’s consistent as it is now,
probably don’t need to change the “not found / could not verify” to UNSURE.

From the classify(...) comments, it seems like this is already the intended behavior.

@koppor do you think this is fine as-is, or do you see this as being too broad semantically?

wanling0000 · 2026-04-03T00:21:32Z

Hi @NishantDG-SST, I tried this locally and the tests pass on my side.

From a quick look, the logic now covers DOI lookup, CrossRef-based DOI discovery, and arXiv-based validation, which is a good first step 👍

A few scope / test questions after reading it:

the original issue description also mentions a search-based fallback via CompositeSearchBasedFetcher, is that planned for later?
the current implementation already introduces UNSURE, but the current tests do not seem to exercise this explicitly yet. Worth adding a small test?
In the second test, the assertion assertNotEquals(REAL, ...) seems quite broad. Since the implementation distinguishes between UNSURE and FAKE, it might be helpful to assert the expected classification more explicitly.
The current tests mainly cover the happy path and a basic negative case. Would it make sense to add a few more small cases to explicitly exercise different branches (e.g. arXiv fallback, CrossRef path)?

Also, if more realistic samples are needed later, the RefChecker test suite might be a useful source of inspiration for real-world citation patterns?

Happy to hear your thoughts on these.

NishantDG-SST · 2026-04-03T13:51:06Z

Hey @wanling0000 thanks for testing

So, CompositeSearchBasedFetcher is planned as a follow-up.
It requires ImporterPreferences to know which catalogs the user has
selected, which belongs in the GUI/CLI layer rather than the logic layer.
UNSURE test I have added one but, a reliable UNSURE test is hard to write against a live API because I cannot control the exact score. The entryWithSlightlyWrongTitleIsClassifiedAsUnsureOrReal() asserts notFAKE.
There are two ways RefChecker can return UNSURE:
a. Network Error: The fetcher throws an exception (not 404).
b. Metadata Mismatch: The API finds the paper but the similarity score falls between 0.5 and 0.8.
assertNotEquals, the reason I used assertNotEquals(REAL) is that the
result could be either UNSURE or FAKE depending on the similarity score,
which is determined by the string similarity algorithm at runtime. I cannot
predict which one it will be without hardcoding implementation details.

But I can assert it FAKE specifically the title and author are so
wrong that the score falls below 0.5. If you agree with that implementation then I'll proceed.
I have added

Test for arXiv fallback: If a paper cannot be validated via its DOI or CrossRef the checker attempts to resolve it using its arXiv identifiers
Test for CrossRef path: an entry with no DOI and no arXiv ID but enough metadata for CrossRef to find it.

Also I'm going through the RefChecker test suite I'll update if I find any inspiration for testing

wanling0000 · 2026-04-04T13:40:32Z

Hi @NishantDG-SST thanks for the detailed explanation and for adding the additional tests, this helps a lot 👍

So, CompositeSearchBasedFetcher is planned as a follow-up.
It requires ImporterPreferences to know which catalogs the user has
selected, which belongs in the GUI/CLI layer rather than the logic layer.

The fallback plan makes sense to me.

UNSURE test I have added one but, a reliable UNSURE test is hard to write against a live API because I cannot control the exact score. The entryWithSlightlyWrongTitleIsClassifiedAsUnsureOrReal() asserts notFAKE.
There are two ways RefChecker can return UNSURE:
a. Network Error: The fetcher throws an exception (not 404).
b. Metadata Mismatch: The API finds the paper but the similarity score falls between 0.5 and 0.8.

From reading RefChecker, my understanding is that UNSURE only occurs when an authoritative entry was found, while cases like “no result” or fetch failures are treated as FAKE.

Just wanted to confirm if this matches the intended behavior, mainly so I can align on testing.

Maybe it would help to document this a bit more explicitly (e.g. in docs or a small test matrix), so the expected classification is clearer.

assertNotEquals, the reason I used assertNotEquals(REAL) is that the
result could be either UNSURE or FAKE depending on the similarity score,
which is determined by the string similarity algorithm at runtime. I cannot
predict which one it will be without hardcoding implementation details.
But I can assert it FAKE specifically the title and author are so
wrong that the score falls below 0.5. If you agree with that implementation then I'll proceed.

Happy for you to continue with your current approach (no need to block on this), I’ll focus on validation/testing on my side :)

NishantDG-SST · 2026-04-04T18:11:41Z

From reading RefChecker, my understanding is that UNSURE only occurs when an authoritative entry was found, while cases like “no result” or fetch failures are treated as FAKE.

Just wanted to confirm if this matches the intended behavior, mainly so I can align on testing.

Yes you are right that is correct and I apologize for the confusion.
Network errors will always return FAKE.
UNSURE is returned in exactly one case: an authoritative entry was found
and the similarity score falls in [0.5, 0.8]. Network errors and not-found
cases always return FAKE with null matchedEntry and score 0.0.

I have documented this explicitly in the classify() JavaDoc and added
a note to check() clarifying that fetch failures always produce FAKE.
The matchedEntry field distinguishes the two FAKE cases null means
nothing was found, non-null means a candidate was found but similarity < 0.5

NishantDG-SST · 2026-04-04T19:31:43Z

So, I added 3 new test cases
entryWithCompletelyWrongAuthorIsNotClassifiedAsReal() (checks real title and real DOI with completely wrong authors)
entryWhoseDOIResolvesToDifferentPaperIsNotClassifiedAsReal() (this does the same as entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() , So if you feel like one of them is not needed i can remove one)
bertPaperWithCorrectDoiIsClassifiedAsReal() (a second real paper test using a different arXiv DOI)

wanling0000 · 2026-04-04T21:57:09Z

Thanks for adding these, this looks good to me

entryWhoseDOIResolvesToDifferentPaperIsNotClassifiedAsReal() (this does the same as entryWithCorrectDoiButWrongMetadataIsNotClassifiedAsReal() , So if you feel like one of them is not needed i can remove one)

I’m fine with either keeping both for clarity or removing one if you prefer to avoid duplication.

NishantDG-SST · 2026-04-04T23:30:14Z

My next steps are to

Add method to RefChecker to process a whole .bib file
Create the refcheck group tree (real paper / unsure / fake paper) using ExplicitGroup and GroupTreeNode like addSuggestedGroups in GroupTreeViewModel.java

Happy to hear your thoughts and will proceed accordingly.

pluto-han · 2026-04-05T12:51:46Z

@NishantDG-SST Thank you for adding a new duplicate checker! The old isDuplicate method does not work well in #15316, compareEntries looks greate!

wanling0000 · 2026-04-05T22:27:14Z

Sorry, I don't want my delayed response to slow down your development, so if you think you have a reasonably solid solution, just go ahead and implement it.

My next steps are to

Add method to RefChecker to process a whole .bib file

Regarding your question, here’s what I think:

It might be helpful to keep the “per-entry classification” logic and the “input orchestration / batch processing” a bit separate, so their behaviors can be tested more independently.

So maybe, instead of adding this directly into RefChecker, it could stay focused as a “single BibEntry checker”, and a small batch/use-case layer could handle processing a whole .bib file?

(This is just a thought from my side, happy to follow whatever direction makes more sense here.)

Create the refcheck group tree (real paper / unsure / fake paper) using ExplicitGroup and GroupTreeNode like addSuggestedGroups in GroupTreeViewModel.java

Similarly, it might be cleaner if the grouping / assignment is handled by a separate assembler/applicator layer.

But, if the PR gets a bit large, splitting into a follow-up PR could also make review easier.

From my side, I’ll mainly focus on validating the behavior and testing as things evolve.

So I was thinking: based on the algorithm described by @koppor , maybe we can first align on the expected behavior (via tests), and then implementation can follow more freely.

As a rough draft from a testing perspective, I tried to map the current logic into a small test matrix:

Layer 1: Per-entry classification

DOI path

Case	Description	Priority	Coverage
A1	DOI match -> REAL	Must	yes
A2	DOI strong mismatch -> FAKE	Must	partial (weak assertion)
A3	DOI partial mismatch -> UNSURE	Should	partial (non-deterministic)
A4	DOI not found	Should	no
A5	DOI exception	Should	no

CrossRef path

Case	Description	Priority	Coverage
B1	CrossRef -> DOI -> REAL	Must	yes
B2	CrossRef mismatch -> FAKE	Should	no
B3	CrossRef partial mismatch -> UNSURE	Should	no
B4	CrossRef not found	Should	no
B5	CrossRef exception	Should	no

arXiv path

Case	Description	Priority	Coverage
C1	arXiv match -> REAL	Must	yes
C2	arXiv mismatch -> FAKE	Should	no
C3	arXiv partial mismatch -> UNSURE	Should	no
C4	no arXiv id	Should	no
C5	arXiv exception	Should	no

Selection logic (bestOf)

Case	Description	Priority	Coverage
D1	prefer UNSURE over FAKE when no REAL (or should this be defined differently?)	Should	no
D2	among same-level non-REAL results, higher score wins	Should	no
D3	all FAKE fallback	Should	no

Classification semantics

Case	Description	Priority	Coverage
E1	REAL threshold	Should	no
E2	UNSURE range	Should	no
E3	FAKE threshold	Should	no
E4	"not found" vs "found but mismatch"	Should	no

One small observation: since current tests rely on live API responses, some similarity-based cases (like UNSURE) might be hard to test deterministically.

NishantDG-SST · 2026-04-05T22:46:05Z

@wanling0000 Yes definitely I'll keep the layers separate.

So for the tests, will using Mockito to mock the fetchers in a new
RefCheckerUnitTest class separate from the existing @FetcherTest class be better?
This might help cover the deterministic cases.

NishantDG-SST · 2026-04-07T10:36:06Z

Covered these Tests in UnitTest till now

DOI path

Case	Description	Unit Test Reference
A2	DOI strong mismatch -> FAKE	`doiStrongMismatchReturnsFake()`
A4	DOI not found	`doiNotFoundReturnsFakeWithNullMatch()`
A5	DOI exception	`doiExceptionReturnsFakeAndTriesNextSource()`

CrossRef path

Case	Description	Unit Test Reference
B4	CrossRef not found	`crossRefNotFoundReturnsFake()`
B5	CrossRef exception	`crossRefExceptionReturnsFake()`

arXiv path

Case	Description	Unit Test Reference
C5	arXiv exception	`allSourcesFailReturnsFakeWithNullMatch()`

Selection logic (bestOf)

Case	Description	Unit Test Reference
D3	all FAKE fallback	`allSourcesFailReturnsFakeWithNullMatch()`

Classification semantics

Case	Description	Unit Test Reference
E4	"not found" vs "found but mismatch"	`notFoundVsFoundButMismatchDistinguishedByOtherEntry()`

I'll cover the rest after this

NishantDG-SST · 2026-04-07T20:42:52Z

More Coverage

Case	Description	Unit Test Reference
A3	DOI partial mismatch -> UNSURE	`doiPartialMismatchReturnsUnsure()`
B2	CrossRef mismatch -> FAKE	`crossRefMismatchReturnsFakeWithOtherEntry()`
C2	arXiv mismatch -> FAKE	`arXivMismatchReturnsFake()`
C4	no arXiv id	`entryWithNoArXivIdReturnsFakeViaArXivSelection()`
D1	prefer UNSURE over FAKE	`bestOfPrefersUnsureOverFake()`

NishantDG-SST · 2026-04-07T20:52:43Z

In D2 same-level -> higher score wins
I tried testing this but, here whats happening is RefChecker logic is picking the first REAL result it finds and stopping there, rather than checking if a following fetcher found even better match or not.
So should I change the RefChecker logic?
@wanling0000

subhramit · 2026-04-16T21:08:16Z

we can create a map of weights for each field like :

Map<Field, Double> FIELD_WEIGHTS = Map.of(
    StandardField.DOI, 2.0,
    StandardField.TITLE, 1.5,
    StandardField.AUTHOR, 1.5,
    StandardField.YEAR, 1.0
);

That is a good idea

wanling0000 · 2026-04-16T22:53:41Z

I tried the LinkedIn case Oliver shared locally, and it is classified as REAL.

@pluto-han for a change of taste, you can also take a look here as to what's going on, and share any thoughts!

Just wanted to check: is this PR blocking any of your work?

If so maybe some parts could be split out into a smaller PR and merged earlier

subhramit · 2026-04-16T23:01:26Z

Just wanted to check: is this PR blocking any of your work?

If so maybe some parts could be split out into a smaller PR and merged earlier

No no, all fine. Was just thinking of getting some more help here since it's a large PR and you're the only one reviewing it.

@pluto-han was also waiting for the new duplicate checker, but I don't think it is a blocker since his PR is now merged.

pluto-han · 2026-04-17T11:02:17Z

@pluto-han for a change of taste, you can also take a look here as to what's going on, and share any thoughts!

Yep I will help review this PR.

PS: I made some little change to the existing duplicate checker and it now works well now. The new duplicate checker here looks also very good.

NishantDG-SST · 2026-04-17T18:59:47Z

So I have done some testing and kept the field weights


private static final Map<Field, Double> COMPARE_ENTRIES_FIELD_WEIGHTS = Map.of(
            StandardField.TITLE, 2.0,
            StandardField.AUTHOR, 2.0,
            StandardField.YEAR, 0.5
    );

but this triggers other tests like doi and usure tests are failing (i'll fix them), but the main thing is that its still failing the second author mismatch test so either i have to increase the author weight even more or we can add something like this inside classify by dropping the score for author hallucinations


if (field.equals(StandardField.AUTHOR) && similarity < 0.8) {
        similarity -= 0.2;
    }

pluto-han · 2026-04-17T19:23:11Z

either i have to increase the author weight even more or we can add something like this inside classify by dropping the score for author hallucinations

Maybe you can try the existing weights in duplicate checker?

    static {
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.AUTHOR, 2.5);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.EDITOR, 2.5);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.TITLE, 3.);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.JOURNAL, 2.);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.NOTE, 0.1);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.COMMENT, 0.1);
        DuplicateCheck.FIELD_WEIGHTS.put(StandardField.DOI, 3.);
    }

NishantDG-SST · 2026-04-17T19:24:46Z

@pluto-han ok I'll try em out thank you

NishantDG-SST · 2026-04-18T20:51:21Z

Ok so to implement this functionality :

With "UNSURE", I meant things JabRef cannot check. Example:

@misc{jabref-issue-13604,
title = {Implement RefChecker in JabKit},
author = {Oliver Kopp},
date = {2025-07-29},
url = {https://github.com/JabRef/jabref/issues/13604}
}
Current JabRef has no ways to check whether this is right or wrong.

We can keep a map of StandardEntryTypes from StandardEntryType.java and if
if entry.getType() is not present in the map of the StandardEntryTypes we return UNSURE.

Just wanted to recheck again if any one of the fields is non-fetchable (even if like 5 other fields are fetchable, assuming total 6 fields provided) it should return UNSURE right?

pluto-han · 2026-04-19T00:02:06Z

Just wanted to recheck again if any one of the fields is non-fetchable (even if like 5 other fields are fetchable, assuming total 6 fields provided) it should return UNSURE right?

I think what @koppor meant in #13604 (comment) is:

If the reference is not checkable, for example @misc, references created by url fetcher etc , JabRef returns "unsure".

But i am not sure for "REAL-LOW-QUALITY", whether JabRef should return "unsure" or "real". Because in the existing duplicate checker, if two references have the same DOI, JabRef thinks they are duplicate even if other non-core fields are different; but this PR seems to implement different logic.

jabref/jablib/src/main/java/org/jabref/logic/database/DuplicateCheck.java

Lines 77 to 89 in 96dcc4f

    
           private static boolean haveSameIdentifier(final BibEntry one, final BibEntry two) { 
        
               return one.getFields().stream() 
        
                         .filter(field -> field.getProperties().contains(FieldProperty.IDENTIFIER)) 
        
                         .anyMatch(field -> two.getField(field) 
        
                                               .map(content -> { 
        
                                                   String oneValue = one.getField(field).orElseThrow(); 
        
                                                   if (field == StandardField.DOI) { 
        
                                                       return oneValue.equalsIgnoreCase(content); 
        
                                                   } 
        
                                                   return oneValue.equals(content); 
        
                                               }) 
        
                                               .orElse(false)); 
        
           }

NishantDG-SST · 2026-04-19T09:48:01Z

Description	Segregated as
doi is present but core fields like author title is wrong	Should be FAKE
doi is present but non core fields are wrong (no core fields provided)	FAKE or UNSURE ?
doi is present but wrong non-core fields, core fields present and matching/close	REAL or maybe UNSURE

wanling0000 · 2026-04-19T16:41:16Z

I think we might be mixing two different questions here:

Is the reference checkable at all by JabRef?
If it is checkable, how should we classify metadata quality?

From this comment, my current understanding is:

REAL = checkable and matching
FAKE = checkable and contradicting
UNSURE = not checkable / not verifiable by current JabRef infrastructure

Because of that, I’m wondering whether we should keep the first step smaller.

If we are still not aligned on the exact meaning of UNSURE (or on what to do with "REAL-LOW-QUALITY" cases), it would be okay to keep the enum for now, but not fully use it in the first version of the logic yet.

From what I see now, there might be two possible ways to move forward:

(1) Keep the scope minimal and just wire up the existing infrastructure so it runs end-to-end (e.g. use isDuplicate to classify checkable cases into REAL / FAKE).

or

(2) Start from that LinkedIn test case, define the semantics more precisely, and possibly adjust the assumptions in DuplicateCheck.

Personally, I lean towards (1) as a first step, and then treat (2) as a follow-up anchor case to refine the behavior.

If you have already try to adapt DuplicateCheck in this PR, I’m also fine with that. But it might help to first make the intended rules a bit more explicit (e.g., which fields are considered more important), since the examples in this draft suggest that some fields should take priority over others.

NishantDG-SST · 2026-04-20T20:45:32Z

@wanling0000 I agree we should keep scope small but I think using isDuplicate would actually decrease the quality here as it bypasses metadata validation as soon as an identifier match is found.

This means a hallucinated reference with a real DOI would be classified REAL.

I'd prefer to keep compareEntries as the classifier since it's already written and tested and just keep the UNSURE refinement ongoing.

NishantDG-SST · 2026-04-21T00:29:20Z

Currently I'm working on comparing author fields positionally by individual author rather than as raw strings also using Levenshtein similarity on full names so hallucinated first names are detected even when the family name matches exactly
But I have a blockage currently that is this test case from the LinkedIn post:

void detectsHallucinatedSecondNameMismatch() throws FetcherException {
        BibEntry authoritativeEntry = new BibEntry(StandardEntryType.Article)
                .withField(StandardField.TITLE, "Interdisciplinary research: Friend or foe to ethical AI?")
                .withField(StandardField.AUTHOR, "Mussgnug, Alexander Martin")
                .withField(StandardField.YEAR, "2026");

        when(doiFetcher.performSearchById("10.1017/cfc.2026.10015"))
                .thenReturn(Optional.of(authoritativeEntry));

        BibEntry entryWithHallucinatedName = new BibEntry(StandardEntryType.Article)
                .withField(StandardField.TITLE, "Interdisciplinary research: Friend or foe to ethical AI?")
                .withField(StandardField.AUTHOR, "Mussgnug, Anna Maria")
                .withField(StandardField.YEAR, "2026")
                .withField(StandardField.DOI, "10.1017/cfc.2026.10015");

        RefCheckResult result = refChecker.check(entryWithHallucinatedName);

        assertEquals(RefValidity.UNSURE, result.validity());
}

Just wanted to be sure again if this test case should return UNSURE or REAL as the TITLE and YEAR match exactly keeping the score high in current implementation.

subhramit · 2026-04-21T04:14:54Z

Please no force pushes.
Merge main without rebasing.
The commit history is not clean currently.

subhramit · 2026-04-21T16:29:53Z

@NishantDG-SST can you please clean up the git tree?

NishantDG-SST · 2026-04-21T17:32:57Z

@subhramit should I create a new branch with a single squashed commit and open a new pr?

subhramit · 2026-04-21T17:47:17Z

@subhramit should I create a new branch with a single squashed commit and open a new pr?

you can reset with upstream/main and push a single fresh commit with these changes at the end.
Other way - when merging, we squash anyway but only issue would be with the commit authorship, so we will have to manually squash-merge and remove co-authors instead of using the merge queue. I guess the second way would be least effort overall. Just make sure that the source diff is clean.

Please be mindful of rebasing in future. Read about the perils of rebasing as well as force-pushes.

NishantDG-SST · 2026-04-21T18:34:51Z

Thanks @subhramit I'll try to be more mindful of rebasing from next time, sorry for the trouble

LoayTarek5

Thanks @NishantDG-SST , clean structure overall, good work

I have a few concerns from my understanding that i will address

LoayTarek5 · 2026-04-27T08:53:56Z

+    ///
+    /// FAKE with a non-null matchedEntry means a candidate was found but did not match.
+    /// FAKE with null matchedEntry means nothing was found at all.
+    private static RefCheckResult classify(BibEntry local, BibEntry authoritative) {


The current mapping:
score >= 0.8 → REAL
score >= 0.5 → UNSURE
score < 0.5 → FAKE

but @koppor clarified in #13604 that UNSURE means "JabRef has no way to verify this at all",
not "partial match found.",
his example was a @misc entry pointing to a GitHub URL that no fetcher can check

This was my understanding, is it correct?

and i think this would also resolve the ongoing confusion about the LinkedIn/Mussgnug case,since "found but author doesn't match well" would be FAKE, not UNSURE

LoayTarek5 · 2026-04-27T09:02:20Z

+        if (doi.isEmpty()) {
+            return new RefCheckResult(RefValidity.FAKE, null, 0.0);
+        }


I think that if (doi.isEmpty()) → FAKE is wrong, no DOI not equal fake paper, it means this source can't help, same for CrossRef/arXiv "not found" cases.

it will result, if all three sources have nothing to check, bestOf returns FAKE.
as i understand from Dr.Oliver(koppor), this should be UNSURE.

my suggestion is to use Optional for check methods, return UNSURE when all sources returned empty.

I think that if (doi.isEmpty()) → FAKE is wrong, no DOI not equal fake paper, it means this source can't help, same for CrossRef/arXiv "not found" cases.

Agreed.

as i understand from Dr.Oliver(koppor), this should be UNSURE.

Im not sure if this should be UNSURE, what if that entry only misses DOI and all other fields are the same? Then maybe this should return TRUE. I think what koppor meant is for online urls not for literatures?

After looking again into the comments i think you are right @pluto-han, thanks.
yeah i think UNSURE is for non-verifiable types (@misc, @online) already handled by VERIFIABLE_TYPES, a verifiable @article with no results -> FAKE is correct.

LoayTarek5 · 2026-04-27T09:05:45Z

The hallucinated-author case ("Mussgnug, Anna Maria" vs "Alexander Martin") was the motivation for adding weights, but has no locked-in unit test.
without it, threshold changes can silently regress this

LoayTarek5 · 2026-04-27T09:09:22Z

+    /// @param one the local entry to check (drives which fields are scored)
+    /// @param two the authoritative entry fetched from an online source
+    /// @return weighted similarity score in [0.0, 1.0]
+    public static double compareEntries(BibEntry one, BibEntry two) {


compareEntries only iterates local fields, entry with just {TITLE} matching an authoritative entry -> score 1.0 -> REAL, regardless of mismatching author/year in the authoritative entry.

i think this means a hallucinated reference with just {title, year} faking a real DOI could land as REAL even if the authoritative entry has author/journal info that contradicts nothing (because those fields aren't compared)(right?)

try to consider requiring a minimum number of comparable fields or penalizing missing core fields

LoayTarek5 · 2026-04-27T09:13:33Z

+            double similarity;
+            if (field.getProperties().contains(FieldProperty.PERSON_NAMES)) {
+                List<Author> localAuthors = AuthorList.parse(firstValue).getAuthors().stream()
+                                                      .filter(a -> !a.getFamilyGiven(false).equalsIgnoreCase("others"))


getFamilyGiven(false).equalsIgnoreCase("others"), Author.OTHERS sentinel may not stringify to "others" with this method, please verify and add a test.

Update: traced through source, Author.OTHERS.getFamilyGiven(false) returns "others" exactly, the filter works.
a small unit test to lock this in would still be nice though.

LoayTarek5 · 2026-04-27T09:16:28Z

+                        double givenSimilarity = (localGiven.isEmpty() || authGiven.isEmpty())
+                                                 ? 1.0
+                                                 : stringSimilarity.similarity(localGiven, authGiven);
+


If local has "Smith" (no given) and authoritative has "Smith, John", given similarity = 1.0, this is reasonable for abbreviation, but the two branches (local-empty vs authoritative-empty vs both-empty) have different semantic meanings, i think it worth adding explicit tests for each branch so future tweaks don't shift behavior silently

Copilot AI review requested due to automatic review settings April 1, 2026 22:22

Copilot started reviewing on behalf of NishantDG-SST April 1, 2026 22:23 View session

github-actions Bot added the good third issue label Apr 1, 2026

qodo-free-for-open-source-projects Bot reviewed Apr 1, 2026

View reviewed changes

Copilot AI reviewed Apr 1, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 1, 2026

Implement RefChecker in JabKit #13604

Open

github-actions Bot added status: changes-required Pull requests that are not yet complete status: no-bot-comments and removed status: changes-required Pull requests that are not yet complete labels Apr 1, 2026

calixtus assigned wanling0000 Apr 3, 2026

github-actions Bot added status: changes-required Pull requests that are not yet complete status: no-bot-comments and removed status: no-bot-comments status: changes-required Pull requests that are not yet complete labels Apr 4, 2026

github-actions Bot removed the status: no-bot-comments label Apr 10, 2026

NishantDG-SST force-pushed the 13604-refchecker branch from 447d615 to 1174222 Compare April 21, 2026 00:59

NishantDG-SST force-pushed the 13604-refchecker branch from f246e80 to 4332569 Compare April 21, 2026 18:26

Add RefChecker logic for reference validation

52be9eb

NishantDG-SST force-pushed the 13604-refchecker branch from 4332569 to 52be9eb Compare April 21, 2026 18:28

cleanup

b475524

NishantDG-SST changed the title ~~Add RefChecker logic for reference validation Relates to #13604~~ Add RefChecker logic for reference validation Apr 21, 2026

NishantDG-SST added 3 commits April 22, 2026 00:13

removed unrelated file changes

4c6ab30

file changes

ab3dedd

Merge remote-tracking branch 'upstream' into 13604-refchecker

cb9d88c

LoayTarek5 suggested changes Apr 27, 2026

View reviewed changes

github-actions Bot added status: changes-required Pull requests that are not yet complete and removed status: no-bot-comments labels Apr 27, 2026

		String firstValue = one.getField(field).orElse("");
		String secondValue = two.getField(field).orElse("");

Uh oh!

Conversation

NishantDG-SST commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related issues and pull requests

PR Description

Steps to test

Checklist

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 1, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Uh oh!

qodo-free-for-open-source-projects Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-free-for-open-source-projects Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanling0000 commented Apr 3, 2026

Uh oh!

NishantDG-SST commented Apr 3, 2026

Uh oh!

wanling0000 commented Apr 4, 2026

Uh oh!

NishantDG-SST commented Apr 4, 2026

Uh oh!

NishantDG-SST commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wanling0000 commented Apr 4, 2026

Uh oh!

NishantDG-SST commented Apr 4, 2026

Uh oh!

pluto-han commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wanling0000 commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NishantDG-SST commented Apr 5, 2026

Uh oh!

NishantDG-SST commented Apr 7, 2026

Covered these Tests in UnitTest till now

DOI path

CrossRef path

arXiv path

Selection logic (bestOf)

Classification semantics

Uh oh!

NishantDG-SST commented Apr 7, 2026

NishantDG-SST commented Apr 1, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented Apr 1, 2026 •

edited

Loading

NishantDG-SST commented Apr 4, 2026 •

edited

Loading

pluto-han commented Apr 5, 2026 •

edited

Loading

wanling0000 commented Apr 5, 2026 •

edited

Loading