Skip to content

Commit 3e8e99c

Browse files
Fix #12271: Integrity checker for year, location, and page numbers in booktitle (#15465)
* Fix #12271: Detect year, country, and page numbers in booktitle Enhance BooktitleChecker to flag booktitle values that contain: - A 4-digit year (e.g. 2015) - A country name (e.g. Norway, Austria, Singapore) - Explicit page-number patterns (e.g. "pp. 1–10", "pages 3-7") Add Countries.java with a hard-coded set of all UN-recognised country names used for the country-presence check. The set is built as a single pre-compiled regex alternation so the pattern is compiled only once. Update BooktitleCheckerTest with parameterised tests covering all three new integrity rules and the blank-value / valid-value edge cases. Closes #12271 * Fix Countries.java javadoc to use /// style * Add missing localization keys for booktitle integrity checks * Address code review: split BooktitleChecker into focused checkers - Extract year, country, and page-number checks into separate ValueChecker classes (BooktitleContainsYearChecker, BooktitleContainsCountryChecker, BooktitleContainsPagesChecker) so all three issues in one booktitle are reported independently - Fix word-boundary regex in country checker: replace [a-z] lookarounds with \p{Alnum} so tokens like USA2015 are not mis-flagged as locations - Register all three new checkers in FieldCheckers for BOOKTITLE - Strengthen tests: use assertEquals with exact expected message instead of assertNotEquals(Optional.empty()); add regression test for alphanumeric token false-positive * Fix Checkstyle: use Pattern instead of Predicate<String> for compiled regexes * Fix Checkstyle: put Pattern.compile on same line as static final declaration * Retrigger CI * refactor: use Locale API for country names, add region markers to tests - Replace hard-coded country list in Countries.java with Locale.getISOCountries() + getDisplayCountry() so no manual maintenance is needed - Use //region and //endregion in BooktitleCheckerTest for better test organization * fix: align stream chain to match IntelliJ formatter style * fix: add space after // in region comments for Checkstyle * refactor: inline country names into checker, remove Countries class - Remove separate Countries.java utility class; inline the Locale-derived country set directly into BooktitleContainsCountryChecker - Use Locale.of() instead of Locale.Builder (matches existing JabRef convention in Language.java) - Remove javadoc from new checkers to match existing checker style (BracketChecker, TitleChecker, YearChecker have none) * chore: retrigger CI after rebase * chore: retrigger CI * fix: use alphanumeric boundary for booktitle year regex Previously the year regex used digit-only lookarounds, so tokens like ICML2015 matched the 2015 and triggered a false positive. Switching the boundary to \p{Alnum} mirrors the country checker and prevents matching years that are embedded inside larger alphanumeric tokens. Also adds a regression test and a CHANGELOG entry under [Unreleased]. --------- Co-authored-by: Subhramit Basu <subhramit.bb@live.in>
1 parent 69f0a94 commit 3e8e99c

7 files changed

Lines changed: 204 additions & 4 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv
2828
- We now support refreshing existing CSL citations with respect to their in-text nature in the LibreOffice integration. [#15369](https://github.com/JabRef/jabref/pull/15369)
2929
- Added context menu entry "Sort tabs alphabetically" to the library tabs. [#15425](https://github.com/JabRef/jabref/pull/15425)
3030
- We added a "Merge" action in the File menu to compare the current library with a selected BibTeX file and review changes. [#15401](https://github.com/JabRef/jabref/issues/15401)
31+
- We added integrity checks that warn when the `booktitle` field contains a year, a country/location, or page numbers that should live in dedicated fields. [#12271](https://github.com/JabRef/jabref/issues/12271)
3132

3233
### Changed
3334

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
package org.jabref.logic.integrity;
2+
3+
import java.util.Arrays;
4+
import java.util.Locale;
5+
import java.util.Optional;
6+
import java.util.regex.Pattern;
7+
import java.util.stream.Collectors;
8+
9+
import org.jabref.logic.l10n.Localization;
10+
import org.jabref.logic.util.strings.StringUtil;
11+
12+
public class BooktitleContainsCountryChecker implements ValueChecker {
13+
14+
private static final Pattern CONTAINS_COUNTRY;
15+
16+
static {
17+
String alternation = Arrays.stream(Locale.getISOCountries())
18+
.map(code -> Locale.of("", code).getDisplayCountry(Locale.ENGLISH))
19+
.filter(name -> !name.isEmpty())
20+
.map(name -> name.toLowerCase(Locale.ENGLISH))
21+
.map(Pattern::quote)
22+
.collect(Collectors.joining("|"));
23+
CONTAINS_COUNTRY = Pattern.compile(
24+
"(?i)(?<!\\p{Alnum})(" + alternation + ")(?!\\p{Alnum})");
25+
}
26+
27+
@Override
28+
public Optional<String> checkValue(String value) {
29+
if (StringUtil.isBlank(value)) {
30+
return Optional.empty();
31+
}
32+
if (CONTAINS_COUNTRY.matcher(value).find()) {
33+
return Optional.of(Localization.lang("booktitle should not contain a location"));
34+
}
35+
return Optional.empty();
36+
}
37+
}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
package org.jabref.logic.integrity;
2+
3+
import java.util.Optional;
4+
import java.util.regex.Pattern;
5+
6+
import org.jabref.logic.l10n.Localization;
7+
import org.jabref.logic.util.strings.StringUtil;
8+
9+
public class BooktitleContainsPagesChecker implements ValueChecker {
10+
11+
private static final Pattern CONTAINS_PAGES = Pattern.compile("(?i)\\b(pp?\\.?|pages?)\\s*[0-9]+");
12+
13+
@Override
14+
public Optional<String> checkValue(String value) {
15+
if (StringUtil.isBlank(value)) {
16+
return Optional.empty();
17+
}
18+
if (CONTAINS_PAGES.matcher(value).find()) {
19+
return Optional.of(Localization.lang("booktitle should not contain page numbers"));
20+
}
21+
return Optional.empty();
22+
}
23+
}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
package org.jabref.logic.integrity;
2+
3+
import java.util.Optional;
4+
import java.util.regex.Pattern;
5+
6+
import org.jabref.logic.l10n.Localization;
7+
import org.jabref.logic.util.strings.StringUtil;
8+
9+
public class BooktitleContainsYearChecker implements ValueChecker {
10+
11+
private static final Pattern CONTAINS_YEAR = Pattern.compile("(?<!\\p{Alnum})[12][0-9]{3}(?!\\p{Alnum})");
12+
13+
@Override
14+
public Optional<String> checkValue(String value) {
15+
if (StringUtil.isBlank(value)) {
16+
return Optional.empty();
17+
}
18+
if (CONTAINS_YEAR.matcher(value).find()) {
19+
return Optional.of(Localization.lang("booktitle should not contain a year"));
20+
}
21+
return Optional.empty();
22+
}
23+
}

jablib/src/main/java/org/jabref/logic/integrity/FieldCheckers.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ private static Multimap<Field, ValueChecker> getAllMap(BibDatabaseContext databa
3131
fieldCheckers.put(field, new PersonNamesChecker(databaseContext));
3232
}
3333
fieldCheckers.put(StandardField.BOOKTITLE, new BooktitleChecker());
34+
fieldCheckers.put(StandardField.BOOKTITLE, new BooktitleContainsYearChecker());
35+
fieldCheckers.put(StandardField.BOOKTITLE, new BooktitleContainsCountryChecker());
36+
fieldCheckers.put(StandardField.BOOKTITLE, new BooktitleContainsPagesChecker());
3437
fieldCheckers.put(StandardField.TITLE, new BracketChecker());
3538
fieldCheckers.put(StandardField.TITLE, new TitleChecker(databaseContext));
3639
fieldCheckers.put(StandardField.DOI, new DoiValidityChecker());

jablib/src/main/resources/l10n/JabRef_en.properties

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1658,6 +1658,9 @@ JabRef\ would\ not\ have\ been\ possible\ without\ the\ help\ of\ our\ contribut
16581658

16591659
HTML\ encoded\ character\ found=HTML encoded character found
16601660
booktitle\ ends\ with\ 'conference\ on'=booktitle ends with 'conference on'
1661+
booktitle\ should\ not\ contain\ a\ location=booktitle should not contain a location
1662+
booktitle\ should\ not\ contain\ a\ year=booktitle should not contain a year
1663+
booktitle\ should\ not\ contain\ page\ numbers=booktitle should not contain page numbers
16611664
contains\ a\ URL=contains a URL
16621665

16631666
incorrect\ control\ digit=incorrect control digit

jablib/src/test/java/org/jabref/logic/integrity/BooktitleCheckerTest.java

Lines changed: 114 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,135 @@
66
import org.junit.jupiter.api.parallel.ResourceLock;
77

88
import static org.junit.jupiter.api.Assertions.assertEquals;
9-
import static org.junit.jupiter.api.Assertions.assertNotEquals;
109

1110
@ResourceLock("Localization.lang")
1211
class BooktitleCheckerTest {
1312

1413
private final BooktitleChecker checker = new BooktitleChecker();
14+
private final BooktitleContainsYearChecker yearChecker = new BooktitleContainsYearChecker();
15+
private final BooktitleContainsCountryChecker countryChecker = new BooktitleContainsCountryChecker();
16+
private final BooktitleContainsPagesChecker pagesChecker = new BooktitleContainsPagesChecker();
17+
18+
// region "ends with conference on" checks
1519

1620
@Test
1721
void booktitleAcceptsIfItDoesNotEndWithConferenceOn() {
18-
assertEquals(Optional.empty(), checker.checkValue("2014 Fourth International Conference on Digital Information and Communication Technology and it's Applications (DICTAP)"));
22+
assertEquals(Optional.empty(), checker.checkValue("Fourth International Conference on Digital Information and Communication Technology and its Applications (DICTAP)"));
1923
}
2024

2125
@Test
22-
void booktitleDoesNotAcceptsIfItEndsWithConferenceOn() {
23-
assertNotEquals(Optional.empty(), checker.checkValue("Digital Information and Communication Technology and it's Applications (DICTAP), 2014 Fourth International Conference on"));
26+
void booktitleDoesNotAcceptIfItEndsWithConferenceOn() {
27+
assertEquals(Optional.of("booktitle ends with 'conference on'"),
28+
checker.checkValue("Digital Information and Communication Technology and its Applications (DICTAP), Fourth International Conference on"));
2429
}
2530

2631
@Test
2732
void booktitleIsBlank() {
2833
assertEquals(Optional.empty(), checker.checkValue(" "));
2934
}
35+
36+
// endregion
37+
38+
// region Year detection
39+
40+
@Test
41+
void booktitleFlagsYearInMiddle() {
42+
assertEquals(Optional.of("booktitle should not contain a year"),
43+
yearChecker.checkValue("European Conference on Circuit Theory and Design, {ECCTD} 2015, Trondheim, Norway"));
44+
}
45+
46+
@Test
47+
void booktitleFlagsYearAtStart() {
48+
assertEquals(Optional.of("booktitle should not contain a year"),
49+
yearChecker.checkValue("2015 {IEEE} International Conference on Digital Signal Processing"));
50+
}
51+
52+
@Test
53+
void booktitleAcceptsWhenNoYear() {
54+
assertEquals(Optional.empty(), yearChecker.checkValue("International Conference on Software Engineering"));
55+
}
56+
57+
@Test
58+
void booktitleYearNotFlaggedInsideAlphanumericToken() {
59+
// "ICML2015" should NOT be flagged — the digits are part of a larger token
60+
assertEquals(Optional.empty(), yearChecker.checkValue("Proceedings ICML2015"));
61+
}
62+
63+
@Test
64+
void booktitleYearCheckerIsBlank() {
65+
assertEquals(Optional.empty(), yearChecker.checkValue(" "));
66+
}
67+
68+
// endregion
69+
70+
// region Location (country) detection
71+
72+
@Test
73+
void booktitleFlagsCountryName() {
74+
assertEquals(Optional.of("booktitle should not contain a location"),
75+
countryChecker.checkValue("Service-Oriented Computing, Fifth International Conference, Vienna, Austria, Proceedings"));
76+
}
77+
78+
@Test
79+
void booktitleFlagsCountryNameSingapore() {
80+
assertEquals(Optional.of("booktitle should not contain a location"),
81+
countryChecker.checkValue("{IEEE} International Conference on Digital Signal Processing, Singapore, Proceedings"));
82+
}
83+
84+
@Test
85+
void booktitleAcceptsWhenNoCountry() {
86+
assertEquals(Optional.empty(), countryChecker.checkValue("International Conference on Machine Learning Proceedings"));
87+
}
88+
89+
@Test
90+
void booktitleCountryNotFlaggedInsideAlphanumericToken() {
91+
// "USA2015" should NOT be flagged — the abbreviation is part of a larger token
92+
assertEquals(Optional.empty(), countryChecker.checkValue("Proceedings USA2015"));
93+
}
94+
95+
@Test
96+
void booktitleCountryCheckerIsBlank() {
97+
assertEquals(Optional.empty(), countryChecker.checkValue(" "));
98+
}
99+
100+
// endregion
101+
102+
// region Page-number detection
103+
104+
@Test
105+
void booktitleFlagsPagesPattern() {
106+
assertEquals(Optional.of("booktitle should not contain page numbers"),
107+
pagesChecker.checkValue("Advances in Neural Information Processing Systems, pp. 1234-1242"));
108+
}
109+
110+
@Test
111+
void booktitleFlagsPagesKeyword() {
112+
assertEquals(Optional.of("booktitle should not contain page numbers"),
113+
pagesChecker.checkValue("Advances in Neural Information Processing Systems, pages 1234-1242"));
114+
}
115+
116+
@Test
117+
void booktitleAcceptsWhenNoPageNumbers() {
118+
assertEquals(Optional.empty(), pagesChecker.checkValue("Advances in Neural Information Processing Systems"));
119+
}
120+
121+
@Test
122+
void booktitlePagesCheckerIsBlank() {
123+
assertEquals(Optional.empty(), pagesChecker.checkValue(" "));
124+
}
125+
126+
// endregion
127+
128+
// region Multiple issues in one booktitle
129+
130+
@Test
131+
void booktitleWithYearAndCountryFlagsBoth() {
132+
// Year checker and country checker are separate — both fire independently
133+
assertEquals(Optional.of("booktitle should not contain a year"),
134+
yearChecker.checkValue("2015 IEEE Conference, Singapore"));
135+
assertEquals(Optional.of("booktitle should not contain a location"),
136+
countryChecker.checkValue("2015 IEEE Conference, Singapore"));
137+
}
138+
139+
// endregion
30140
}

0 commit comments

Comments
 (0)