JabRef/jabref by Guru6446 · Pull Request #15565 · JabRef/jabref

Guru6446 · 2026-04-16T16:26:04Z

Description

Implements feature request - Allow users to add entries using full URLs instead of just plain identifiers.

Currently, when a user tries to add a new entry using a URL like https://doi.org/10.1145/3544548.3580995, JabRef shows an error. This PR adds support for parsing common URL formats automatically.

Changes

New Files

✅ jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java - URL parser utility
✅ jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java - 16 unit tests

Modified Files

✅ jablib/src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java - Use new parser
✅ jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java - Use new parser

Supported URL Formats

DOI

https://doi.org/10.1145/3544548.3580995
https://dx.doi.org/10.1145/3544548.3580995
http://doi.org/10.1145/3544548.3580995
https://dl.acm.org/doi/10.1145/3544548.3580995
https://dl.acm.org/doi/abs/10.1145/3544548.3580995

arXiv

https://arxiv.org/abs/2203.02155
https://arxiv.org/pdf/2203.02155.pdf
http://arxiv.org/abs/2203.02155
Old format: https://arxiv.org/abs/math.GT/0309136

Backward Compatibility

✅ All existing functionality preserved:

Plain DOIs: 10.1145/3544548.3580995 still works
Plain arXiv IDs: 2203.02155 still works

Testing

✅ 16 new unit tests created and all passing
✅ Code compiles successfully (./gradlew jablib:compileJava)
✅ Tests verified with ./gradlew jablib:test --tests UrlIdentifierParserTest

Implementation Details

The UrlIdentifierParser uses regex patterns to:

Detect if input is a URL or plain identifier
Extract the actual identifier from URLs
Pass extracted identifier to existing DOI.parse() or ArXivIdentifier.parse()
Fall back to plain parsing if no URL pattern matches

This minimizes changes to existing code while adding new functionality.

Fixes #15411

- Create UrlIdentifierParser utility class - Extract DOI from various URL formats (doi.org, dx.doi.org, dl.acm.org) - Extract arXiv ID from URLs (arxiv.org/abs/, arxiv.org/pdf/) - Add 16 comprehensive unit tests (all passing) - Maintains backward compatibility with plain IDs Supports: - DOI URLs: https://doi.org/10.1145/..., https://dx.doi.org/..., https://dl.acm.org/doi/... - arXiv URLs: https://arxiv.org/abs/..., https://arxiv.org/pdf/....pdf - Plain IDs: 10.1145/... (DOI), 2203.02155 (arXiv) Fixes JabRef#15411

- Use UrlIdentifierParser.parseDOI() instead of DOI.parse() - Now supports DOI URLs (doi.org, dx.doi.org, dl.acm.org) - Maintains backward compatibility with plain DOIs Part of JabRef#15411

- Use UrlIdentifierParser.parseArXiv() instead of ArXivIdentifier.parse() - Now supports arXiv URLs (arxiv.org/abs/, arxiv.org/pdf/) - Maintains backward compatibility with plain arXiv IDs Part of JabRef#15411

qodo-free-for-open-source-projects · 2026-04-16T16:26:29Z

Review Summary by Qodo

Add URL identifier parsing for DOI and arXiv fetchers

✨ Enhancement

Walkthroughs

Description

• Add UrlIdentifierParser utility to extract identifiers from URLs
• Support DOI URLs (doi.org, dx.doi.org, dl.acm.org formats)
• Support arXiv URLs (arxiv.org/abs/, arxiv.org/pdf/ formats)
• Update DoiFetcher and ArXivFetcher to use new parser
• Maintain backward compatibility with plain identifiers

Diagram

flowchart LR
  Input["User Input<br/>URL or Plain ID"]
  Parser["UrlIdentifierParser"]
  DOIParser["parseDOI()"]
  ArXivParser["parseArXiv()"]
  DOIFetcher["DoiFetcher"]
  ArXivFetcher["ArXivFetcher"]
  
  Input --> Parser
  Parser --> DOIParser
  Parser --> ArXivParser
  DOIParser --> DOIFetcher
  ArXivParser --> ArXivFetcher

File Changes

1. jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java ✨ Enhancement +59/-0

New URL identifier parser utility class

• New utility class for parsing identifiers from URLs and plain text
• Implements parseDOI() method with regex patterns for doi.org, dx.doi.org, and dl.acm.org URLs
• Implements parseArXiv() method with regex pattern for arxiv.org URLs
• Falls back to plain identifier parsing if no URL pattern matches

jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java

2. jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java 🧪 Tests +103/-0

Unit tests for URL identifier parser
• 16 comprehensive unit tests for UrlIdentifierParser
• Tests cover DOI parsing from plain IDs and various URL formats
• Tests cover arXiv parsing from plain IDs and various URL formats
• Tests verify null/empty input handling and invalid URL rejection
jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java

3. jablib/src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java ✨ Enhancement +3/-2

Update DoiFetcher to use URL parser

• Add import for UrlIdentifierParser
• Replace DOI.parse() with UrlIdentifierParser.parseDOI() in doAPILimiting() method
• Replace DOI.parse() with UrlIdentifierParser.parseDOI() in performSearchById() method
• Enables DOI fetcher to accept full DOI URLs in addition to plain DOIs

jablib/src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java

View more (1)

4. jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java ✨ Enhancement +2/-1

Update ArXivFetcher to use URL parser
• Add import for UrlIdentifierParser
• Replace ArXivIdentifier.parse() with UrlIdentifierParser.parseArXiv() in performSearchById()
 method
• Enables arXiv fetcher to accept full arXiv URLs in addition to plain IDs
jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java

qodo-free-for-open-source-projects · 2026-04-16T16:26:30Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (3) 📎 Requirement gaps (0)

1. Tests only assert presence 📘 Rule violation ≡ Correctness

Description

UrlIdentifierParserTest uses
assertTrue(optional.isPresent())/assertFalse(optional.isPresent()) instead of asserting the
exact parsed DOI/arXiv value. This weakens test precision and can allow incorrect-but-present
parsing results to pass.

Code

jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java[R13-102]

+    @Test
+    void parseDOIFromPlainDOI() {
+        String input = "10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromDoiOrgURL() {
+        String input = "https://doi.org/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromDxDoiOrgURL() {
+        String input = "https://dx.doi.org/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromHTTPURL() {
+        String input = "http://doi.org/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromACMDigitalLibrary() {
+        String input = "https://dl.acm.org/doi/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromACMAbsURL() {
+        String input = "https://dl.acm.org/doi/abs/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIReturnsEmptyForNull() {
+        assertFalse(UrlIdentifierParser.parseDOI(null).isPresent());
+    }
+
+    @Test
+    void parseDOIReturnsEmptyForEmptyString() {
+        assertFalse(UrlIdentifierParser.parseDOI("").isPresent());
+    }
+
+    @Test
+    void parseDOIReturnsEmptyForInvalidURL() {
+        assertFalse(UrlIdentifierParser.parseDOI("https://example.com").isPresent());
+    }
+
+    @Test
+    void parseArXivFromPlainID() {
+        String input = "2203.02155";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivFromAbsURL() {
+        String input = "https://arxiv.org/abs/2203.02155";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivFromPDFURL() {
+        String input = "https://arxiv.org/pdf/2203.02155.pdf";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivFromHTTPURL() {
+        String input = "http://arxiv.org/abs/2203.02155";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivReturnsEmptyForNull() {
+        assertFalse(UrlIdentifierParser.parseArXiv(null).isPresent());
+    }
+
+    @Test
+    void parseArXivReturnsEmptyForInvalidURL() {
+        assertFalse(UrlIdentifierParser.parseArXiv("https://example.com").isPresent());
+    }
+
+    @Test
+    void parseArXivHandlesOldIDFormat() {
+        String input = "https://arxiv.org/abs/math.GT/0309136";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }

Evidence

The compliance checklist requires unit tests to assert exact expected values rather than weak
predicate checks like isPresent(). The added tests repeatedly check only presence/absence (e.g.,
assertTrue(UrlIdentifierParser.parseDOI(input).isPresent())) and never assert the extracted
identifier content.

jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java[13-102]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The new tests only assert `Optional.isPresent()` / `isPresent()`-negation, which is a weak predicate and does not verify that the parser extracted the correct DOI/arXiv identifier.
## Issue Context
Per test compliance, assertions should compare against the full expected value/structure (e.g., `assertEquals(expectedOptional, actualOptional)`).
## Fix Focus Areas
- jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java[13-102]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. arXiv PDF URL fails 🐞 Bug ≡ Correctness

Description

ArXivFetcher.performSearchById still calls arXiv.asyncPerformSearchById(identifier) with the raw
input, so https://arxiv.org/pdf/.pdf fails because ArXivIdentifier.parse rejects the trailing
.pdf. The new UrlIdentifierParser.parseArXiv result is only used for the DOI-infusion
optimization, not for the actual arXiv lookup, so the feature doesn’t work end-to-end for PDF URLs.

Code

jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java[R340-344]

  public Optional<BibEntry> performSearchById(String identifier) throws FetcherException {
      CompletableFuture<Optional<BibEntry>> arXivBibEntryPromise = arXiv.asyncPerformSearchById(identifier);
      if (this.doiFetcher != null) {
-            inplaceAsyncInfuseArXivWithDoi(arXivBibEntryPromise, ArXivIdentifier.parse(identifier));
+            inplaceAsyncInfuseArXivWithDoi(arXivBibEntryPromise, UrlIdentifierParser.parseArXiv(identifier));
      }

Evidence
The main fetch path uses the raw identifier string (not the parsed/normalized arXiv ID). The
underlying arXiv lookup rejects inputs that don’t match ArXivIdentifier.parse (notably PDF URLs with
a .pdf suffix), returning Optional.empty().
jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java[339-346]
jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java[412-416]
jablib/src/main/java/org/jabref/model/entry/identifier/ArXivIdentifier.java[21-55]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ArXivFetcher.performSearchById` still passes the raw user input into `arXiv.asyncPerformSearchById(...)`. For inputs like `https://arxiv.org/pdf/2203.02155.pdf`, `ArXivIdentifier.parse(...)` rejects the `.pdf` suffix, so the actual fetch returns empty even though `UrlIdentifierParser.parseArXiv` can normalize this URL.
### Issue Context
`UrlIdentifierParser.parseArXiv(identifier)` is currently only used to accelerate DOI infusion, not to normalize the identifier used in the actual arXiv fetch.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java[339-346]
### Suggested fix
1. Parse once at the start of `performSearchById`:
- `Optional<ArXivIdentifier> parsed = UrlIdentifierParser.parseArXiv(identifier);`
2. If `parsed.isEmpty()`, return `Optional.empty()` (or keep existing behavior, but avoid calling the API with a non-parseable URL).
3. Call `arXiv.asyncPerformSearchById(parsed.get().asString())` instead of using `identifier`.
4. Pass `parsed` into `inplaceAsyncInfuseArXivWithDoi(...)` so both the fetch and the optimization use the same normalized value.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~mEDRA called with URL~~ ☑ 🐞 Bug ≡ Correctness

Description

DoiFetcher.performSearchById now accepts DOI URLs via UrlIdentifierParser.parseDOI, but the mEDRA
special-case still calls Medra.performSearchById(identifier) with the original (possibly URL) input.
This builds an invalid mEDRA API URL and breaks lookups for mEDRA-registered DOIs when the user
pastes a DOI URL.

Code

jablib/src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java[R124-126]

  public Optional<BibEntry> performSearchById(String identifier) throws FetcherException {
-        DOI doi = DOI.parse(identifier)
+        DOI doi = UrlIdentifierParser.parseDOI(identifier)
                   .orElseThrow(() -> new FetcherException(Localization.lang("Invalid DOI: '%0'.", identifier)));

Evidence
DoiFetcher parses the identifier into a DOI object but still forwards the unparsed identifier to
the mEDRA fetcher. Medra builds its request URL by concatenating API_URL + "/" + identifier, which
fails if identifier is itself a URL.
jablib/src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java[124-146]
jablib/src/main/java/org/jabref/logic/importer/fetcher/Medra.java[103-106]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When the DOI agency is mEDRA, `DoiFetcher.performSearchById` currently calls `new Medra().performSearchById(identifier)` using the original input string. With the new URL parsing support, `identifier` may be a full URL (e.g., `https://doi.org/...`), which causes `Medra.getUrlForIdentifier` to produce an invalid request URL.
### Issue Context
`DoiFetcher` already computes a parsed `DOI doi = UrlIdentifierParser.parseDOI(identifier)...` in this method.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/fetcher/DoiFetcher.java[124-146]
- jablib/src/main/java/org/jabref/logic/importer/fetcher/Medra.java[103-106]
### Suggested fix
In the mEDRA branch, call Medra with the normalized DOI string (e.g., `doi.asString()`) rather than the original `identifier`.
Example:
- Replace `return new Medra().performSearchById(identifier);`
- With `return new Medra().performSearchById(doi.asString());`

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (1)

4. ACM DOI regex rejects pdf 🐞 Bug ≡ Correctness

Description

UrlIdentifierParser.parseDOI short-circuits on DOI_ACM_PATTERN and captures everything after
/doi/, so URLs like https://dl.acm.org/doi/pdf/10.... are turned into pdf/10.... and then
rejected by DOI.parse. This is a regression because DOI.parse is already able to extract a DOI
embedded later in an arbitrary https URL.

Code

jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[R19-40]

+    private static final Pattern DOI_ACM_PATTERN =
+            Pattern.compile("https?://dl\\.acm\\.org/doi/(?:abs/)?(.+)");
+
+    private static final Pattern ARXIV_URL_PATTERN =
+            Pattern.compile("https?://arxiv\\.org/(?:abs|pdf)/([\\w.\\-]+?)(?:\\.pdf)?$");
+
+    public static Optional<DOI> parseDOI(String input) {
+        if (input == null || input.isBlank()) {
+            return Optional.empty();
+        }
+
+        String trimmedInput = input.trim();
+
+        Matcher doiUrlMatcher = DOI_URL_PATTERN.matcher(trimmedInput);
+        if (doiUrlMatcher.find()) {
+            return DOI.parse(doiUrlMatcher.group(1));
+        }
+
+        Matcher acmMatcher = DOI_ACM_PATTERN.matcher(trimmedInput);
+        if (acmMatcher.find()) {
+            return DOI.parse(acmMatcher.group(1));
+        }

Evidence
The ACM regex captures arbitrary suffixes (not necessarily starting with 10.) and the method
returns immediately on match, preventing fallback to parsing the full URL. DOI.parse’s exact DOI
pattern explicitly allows an arbitrary https?://... prefix before the 10.x/... DOI group, so
parsing the full ACM URL is expected to work where pdf/10... fails.
jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[19-42]
jablib/src/main/java/org/jabref/model/entry/identifier/DOI.java[81-90]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`UrlIdentifierParser.parseDOI` uses `DOI_ACM_PATTERN = https?://dl.acm.org/doi/(?:abs/)?(.+)` and returns `DOI.parse(acmMatcher.group(1))`. For common ACM URLs such as `/doi/pdf/10.1145/...`, this extracts `pdf/10.1145/...` and causes parsing to fail.
### Issue Context
`DOI.parse(...)` is already designed to handle many URL forms by allowing an arbitrary `https?://...` prefix before the DOI group.
### Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[16-43]
### Suggested fix options
Option A (simplest/robust):
- Remove the special-case ACM/doi.org regexes and just `return DOI.parse(trimmedInput);` (or use `DOI.findInText(trimmedInput)` first if you want to safely ignore query/fragment junk).
Option B (keep special-cases):
- Tighten the ACM pattern to capture only a DOI starting with `10.` and stop at query/fragment:
- e.g., `https?://dl\\.acm\\.org/doi/(?:abs/|pdf/|full/)?(10\\.[^\\s?#]+)`
- Use `matches()` (or anchor with `^...$`) instead of `find()` so you don’t accidentally capture trailing unrelated text.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. Null parameters lack @nullable 📘 Rule violation ⚙ Maintainability

Description

parseDOI/parseArXiv accept null inputs via ad-hoc null checks, but the nullness contract is
not expressed with JSpecify annotations. This makes the API contract unclear and encourages passing
null rather than using explicit nullness annotations.

Code

jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[R25-48]

+    public static Optional<DOI> parseDOI(String input) {
+        if (input == null || input.isBlank()) {
+            return Optional.empty();
+        }
+
+        String trimmedInput = input.trim();
+
+        Matcher doiUrlMatcher = DOI_URL_PATTERN.matcher(trimmedInput);
+        if (doiUrlMatcher.find()) {
+            return DOI.parse(doiUrlMatcher.group(1));
+        }
+
+        Matcher acmMatcher = DOI_ACM_PATTERN.matcher(trimmedInput);
+        if (acmMatcher.find()) {
+            return DOI.parse(acmMatcher.group(1));
+        }
+
+        return DOI.parse(trimmedInput);
+    }
+
+    public static Optional<ArXivIdentifier> parseArXiv(String input) {
+        if (input == null || input.isBlank()) {
+            return Optional.empty();
+        }

Evidence

The checklist requires using Optional and JSpecify nullness annotations to clarify null-handling
contracts. The new public methods accept null (if (input == null || input.isBlank())) but do not
annotate the parameter as @Nullable (or otherwise define a non-null contract).

AGENTS.md
jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[25-48]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`UrlIdentifierParser.parseDOI` and `parseArXiv` explicitly handle `null` inputs but do not declare the parameter nullness contract using JSpecify annotations.
## Issue Context
The codebase uses `org.jspecify.annotations.Nullable` in many places; these methods should either (a) declare `@Nullable` for inputs they accept as null or (b) reject null by contract and remove null-passing tests.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[25-58]
- jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java[49-52]
- jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java[88-91]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

6. Trivial UrlIdentifierParser Javadoc 📘 Rule violation ⚙ Maintainability

Description

The added class Javadoc restates what the class does (parsing identifiers) without explaining design
intent or rationale. This adds noise rather than conveying the 'why'.

Code

jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[R10-13]
+/**
+ * Parses identifiers from URLs and plain text.
+ * Extracts DOI, arXiv ID, etc. from various URL formats.
+ */

Evidence

The checklist forbids trivial/paraphrasing comments and requires comments to explain
intent/rationale. The new Javadoc only paraphrases the class name/purpose (parsing identifiers from
URLs/plain text) without adding 'why' context.

AGENTS.md
jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[10-13]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The class-level comment is a paraphrase and does not explain the rationale/intent behind introducing this utility.
## Issue Context
Comments should explain *why* the code exists (e.g., centralizing URL normalization so existing identifier parsers can remain unchanged), or be removed if they add no value.
## Fix Focus Areas
- jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[10-13]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-free-for-open-source-projects · 2026-04-16T16:31:52Z

+    @Test
+    void parseDOIFromPlainDOI() {
+        String input = "10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromDoiOrgURL() {
+        String input = "https://doi.org/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromDxDoiOrgURL() {
+        String input = "https://dx.doi.org/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromHTTPURL() {
+        String input = "http://doi.org/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromACMDigitalLibrary() {
+        String input = "https://dl.acm.org/doi/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIFromACMAbsURL() {
+        String input = "https://dl.acm.org/doi/abs/10.1145/3544548.3580995";
+        assertTrue(UrlIdentifierParser.parseDOI(input).isPresent());
+    }
+
+    @Test
+    void parseDOIReturnsEmptyForNull() {
+        assertFalse(UrlIdentifierParser.parseDOI(null).isPresent());
+    }
+
+    @Test
+    void parseDOIReturnsEmptyForEmptyString() {
+        assertFalse(UrlIdentifierParser.parseDOI("").isPresent());
+    }
+
+    @Test
+    void parseDOIReturnsEmptyForInvalidURL() {
+        assertFalse(UrlIdentifierParser.parseDOI("https://example.com").isPresent());
+    }
+
+    @Test
+    void parseArXivFromPlainID() {
+        String input = "2203.02155";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivFromAbsURL() {
+        String input = "https://arxiv.org/abs/2203.02155";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivFromPDFURL() {
+        String input = "https://arxiv.org/pdf/2203.02155.pdf";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivFromHTTPURL() {
+        String input = "http://arxiv.org/abs/2203.02155";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }
+
+    @Test
+    void parseArXivReturnsEmptyForNull() {
+        assertFalse(UrlIdentifierParser.parseArXiv(null).isPresent());
+    }
+
+    @Test
+    void parseArXivReturnsEmptyForInvalidURL() {
+        assertFalse(UrlIdentifierParser.parseArXiv("https://example.com").isPresent());
+    }
+
+    @Test
+    void parseArXivHandlesOldIDFormat() {
+        String input = "https://arxiv.org/abs/math.GT/0309136";
+        assertTrue(UrlIdentifierParser.parseArXiv(input).isPresent());
+    }


1. Tests only assert presence 📘 Rule violation ≡ Correctness

UrlIdentifierParserTest uses assertTrue(optional.isPresent())/assertFalse(optional.isPresent()) instead of asserting the exact parsed DOI/arXiv value. This weakens test precision and can allow incorrect-but-present parsing results to pass.

Agent Prompt

## Issue description The new tests only assert `Optional.isPresent()` / `isPresent()`-negation, which is a weak predicate and does not verify that the parser extracted the correct DOI/arXiv identifier. ## Issue Context Per test compliance, assertions should compare against the full expected value/structure (e.g., `assertEquals(expectedOptional, actualOptional)`). ## Fix Focus Areas - jablib/src/test/java/org/jabref/logic/importer/util/UrlIdentifierParserTest.java[13-102]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-04-16T16:31:52Z

    public Optional<BibEntry> performSearchById(String identifier) throws FetcherException {
        CompletableFuture<Optional<BibEntry>> arXivBibEntryPromise = arXiv.asyncPerformSearchById(identifier);
        if (this.doiFetcher != null) {
-            inplaceAsyncInfuseArXivWithDoi(arXivBibEntryPromise, ArXivIdentifier.parse(identifier));
+            inplaceAsyncInfuseArXivWithDoi(arXivBibEntryPromise, UrlIdentifierParser.parseArXiv(identifier));
        }


2. Arxiv pdf url fails 🐞 Bug ≡ Correctness

ArXivFetcher.performSearchById still calls arXiv.asyncPerformSearchById(identifier) with the raw input, so https://arxiv.org/pdf/<id>.pdf fails because ArXivIdentifier.parse rejects the trailing .pdf. The new UrlIdentifierParser.parseArXiv result is only used for the DOI-infusion optimization, not for the actual arXiv lookup, so the feature doesn’t work end-to-end for PDF URLs.

Agent Prompt

### Issue description `ArXivFetcher.performSearchById` still passes the raw user input into `arXiv.asyncPerformSearchById(...)`. For inputs like `https://arxiv.org/pdf/2203.02155.pdf`, `ArXivIdentifier.parse(...)` rejects the `.pdf` suffix, so the actual fetch returns empty even though `UrlIdentifierParser.parseArXiv` can normalize this URL. ### Issue Context `UrlIdentifierParser.parseArXiv(identifier)` is currently only used to accelerate DOI infusion, not to normalize the identifier used in the actual arXiv fetch. ### Fix Focus Areas - jablib/src/main/java/org/jabref/logic/importer/fetcher/ArXivFetcher.java[339-346] ### Suggested fix 1. Parse once at the start of `performSearchById`: - `Optional<ArXivIdentifier> parsed = UrlIdentifierParser.parseArXiv(identifier);` 2. If `parsed.isEmpty()`, return `Optional.empty()` (or keep existing behavior, but avoid calling the API with a non-parseable URL). 3. Call `arXiv.asyncPerformSearchById(parsed.get().asString())` instead of using `identifier`. 4. Pass `parsed` into `inplaceAsyncInfuseArXivWithDoi(...)` so both the fetch and the optimization use the same normalized value.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-04-16T16:31:52Z

+    private static final Pattern DOI_ACM_PATTERN =
+            Pattern.compile("https?://dl\\.acm\\.org/doi/(?:abs/)?(.+)");
+
+    private static final Pattern ARXIV_URL_PATTERN =
+            Pattern.compile("https?://arxiv\\.org/(?:abs|pdf)/([\\w.\\-]+?)(?:\\.pdf)?$");
+
+    public static Optional<DOI> parseDOI(String input) {
+        if (input == null || input.isBlank()) {
+            return Optional.empty();
+        }
+
+        String trimmedInput = input.trim();
+
+        Matcher doiUrlMatcher = DOI_URL_PATTERN.matcher(trimmedInput);
+        if (doiUrlMatcher.find()) {
+            return DOI.parse(doiUrlMatcher.group(1));
+        }
+
+        Matcher acmMatcher = DOI_ACM_PATTERN.matcher(trimmedInput);
+        if (acmMatcher.find()) {
+            return DOI.parse(acmMatcher.group(1));
+        }


4. Acm doi regex rejects pdf 🐞 Bug ≡ Correctness

UrlIdentifierParser.parseDOI short-circuits on DOI_ACM_PATTERN and captures everything after /doi/, so URLs like https://dl.acm.org/doi/pdf/10.... are turned into pdf/10.... and then rejected by DOI.parse. This is a regression because DOI.parse is already able to extract a DOI embedded later in an arbitrary https URL.

Agent Prompt

### Issue description `UrlIdentifierParser.parseDOI` uses `DOI_ACM_PATTERN = https?://dl.acm.org/doi/(?:abs/)?(.+)` and returns `DOI.parse(acmMatcher.group(1))`. For common ACM URLs such as `/doi/pdf/10.1145/...`, this extracts `pdf/10.1145/...` and causes parsing to fail. ### Issue Context `DOI.parse(...)` is already designed to handle many URL forms by allowing an arbitrary `https?://...` prefix before the DOI group. ### Fix Focus Areas - jablib/src/main/java/org/jabref/logic/importer/util/UrlIdentifierParser.java[16-43] ### Suggested fix options Option A (simplest/robust): - Remove the special-case ACM/doi.org regexes and just `return DOI.parse(trimmedInput);` (or use `DOI.findInText(trimmedInput)` first if you want to safely ignore query/fragment junk). Option B (keep special-cases): - Tighten the ACM pattern to capture only a DOI starting with `10.` and stop at query/fragment: - e.g., `https?://dl\\.acm\\.org/doi/(?:abs/|pdf/|full/)?(10\\.[^\\s?#]+)` - Use `matches()` (or anchor with `^...$`) instead of `find()` so you don’t accidentally capture trailing unrelated text.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

- Add value assertions to tests (verify actual extracted DOI values) - Use extracted DOI in mEDRA call (fixes mEDRA lookups with URLs) - Properly extract arXiv ID before passing to fetcher Addresses review comments on PR

jabref-machine · 2026-04-16T16:59:05Z

You introduced non-Markdown JavaDoc. Please use Markdown JavaDoc (///).

jabref-machine · 2026-04-16T17:14:51Z

Your code currently does not meet JabRef's code guidelines. IntelliJ auto format covers some cases. There seem to be issues with your code style and autoformat configuration. Please reformat your code (Ctrl+Alt+L) and commit, then push.

faneeshh · 2026-04-16T20:56:13Z

You need to disclose the use of AI in the PR. Contributing Guide

jabref-machine · 2026-04-17T05:49:50Z

You have removed the section "Checklist" from your pull request description. Please adhere to our pull request template.

calixtus · 2026-04-17T06:11:44Z

Hello @Guru6446 welcome to JabRef community and thank you for your interest.
Please use a proper title for your Pull Request.

I noticed that you made a PR for an issue, for which already another PR exists. If we decide to finish and merge the other PR (which is not unlikely, since we already put some review work in it), all your work would be in vain. This would be very sad, since we have many other issues, that still needs someone to take a look on.

In the future, please make sure first, that there is no other PR already open for your PR. Our assignment system for github has its limits and it does not guarantee that something is overlooked. You are still responsible.

Please understand that until the other PR mentioned is merged or closed, there wont be any work put in this PR from our side about reviewing your PR, to save our time.

calixtus · 2026-04-17T06:12:13Z

Please also fix your PR description

pluto-han · 2026-04-17T11:17:18Z

Please do not use AI to generate PR discription, jabRef has its own PR discription format

You have only done the backend part, please also implement the ui, and before then please mark this PR as a draft.

Edit: After a quick look, your code is wrong.

Why change the fetcher of ArXiv and Doi?
In UrlIdentifierParser.java‎, you should fetch URL, not Doi or ArXiv.

subhramit · 2026-04-18T19:48:18Z

Contributor not responsive, and PR description format is completely changed.
A guess might be this contribution was done by a bot, and not checked on later.
Closing.

Guru6446 added 3 commits April 16, 2026 21:32

Update DoiFetcher to use UrlIdentifierParser

5d3ee6f

- Use UrlIdentifierParser.parseDOI() instead of DOI.parse() - Now supports DOI URLs (doi.org, dx.doi.org, dl.acm.org) - Maintains backward compatibility with plain DOIs Part of JabRef#15411

Update ArXivFetcher to use UrlIdentifierParser

8ccbde9

- Use UrlIdentifierParser.parseArXiv() instead of ArXivIdentifier.parse() - Now supports arXiv URLs (arxiv.org/abs/, arxiv.org/pdf/) - Maintains backward compatibility with plain arXiv IDs Part of JabRef#15411

github-actions Bot added good first issue An issue intended for project-newcomers. Varies in difficulty. component: fetcher labels Apr 16, 2026

qodo-free-for-open-source-projects Bot reviewed Apr 16, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 16, 2026

Add entry using URL #15411

Open

github-actions Bot added the status: changes-required Pull requests that are not yet complete label Apr 16, 2026

Guru6446 added 2 commits April 16, 2026 22:25

Remove accidentally committed files

224acab

subhramit closed this Apr 18, 2026

Uh oh!

Conversation

Guru6446 commented Apr 16, 2026 • edited by calixtus Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

New Files

Modified Files

Supported URL Formats

DOI

arXiv

Backward Compatibility

Testing

Implementation Details

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 16, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-free-for-open-source-projects Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-free-for-open-source-projects Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qodo-free-for-open-source-projects Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

jabref-machine commented Apr 16, 2026

Uh oh!

jabref-machine commented Apr 16, 2026

Uh oh!

faneeshh commented Apr 16, 2026

Uh oh!

jabref-machine commented Apr 17, 2026

Uh oh!

calixtus commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

calixtus commented Apr 17, 2026

Uh oh!

pluto-han commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

subhramit commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Guru6446 commented Apr 16, 2026 •

edited by calixtus

Loading

qodo-free-for-open-source-projects Bot commented Apr 16, 2026 •

edited

Loading

calixtus commented Apr 17, 2026 •

edited

Loading

pluto-han commented Apr 17, 2026 •

edited

Loading