Add codespell support with configuration and fixes#589
Open
yarikoptic wants to merge 6 commits into
Open
Conversation
- Skip external vocabulary data files (fuji_server/data/*.{yaml,json},
linked_vocabs/) — these are third-party identifier and ontology data.
- Skip test VCR cassettes and the PHP simpleclient (external content).
- Skip Jupyter notebooks (base64 images cause false positives).
- Add ignore-regex for URLs (DOIs/links can contain "typos" that must
not be fixed).
- Add ignore-words-list entries for domain terms:
- connexion — Python library name
- lod — Linked Open Data
- ore — Object Reuse and Exchange (OAI-ORE)
- Fix stale comment in pre-commit config (codespell config lives in
pyproject.toml, not .pre-commit-config.yaml) and add tomli fallback
dependency for Python <3.11.
Co-Authored-By: Claude Code 2.1.114 / Claude Opus 4.7 <noreply@anthropic.com>
Fixed typos where codespell offers multiple suggestions, selecting the correct fix based on surrounding context: - Selectin -> Selecting (metrics_configuration.md:38 heading) - shat -> that (docs/source/conf.py:324 comment) - stati -> statuses (metadata_harvester.py:111 comment) - doesnt -> doesn't (metadata_harvester.py:1268 comment) - "fo rthe" -> "for the" (body.py, openapi.yaml — misplaced space) - taked -> tagged (test_preprocessor.py:17 docstring — not in codespell's dictionary; actual word is "tagged", not "took/taken") - identifer -> identifiers (fuji_server/data/README.md:9 — refers to the identifiers.org service) Co-Authored-By: Claude Code 2.1.114 / Claude Opus 4.7 <noreply@anthropic.com>
=== Do not change lines below ===
{
"chain": [],
"cmd": "uvx codespell -w",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
yarikoptic
commented
Apr 20, 2026
| :type core_metadata_status: str | ||
| """ | ||
| allowed_values = ["insufficent metadata", "partial metadata", "all metadata"] | ||
| allowed_values = ["insufficient metadata", "partial metadata", "all metadata"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add codespell configuration and fix existing typos.
More about codespell: https://github.com/codespell-project/codespell
I personally introduced it to over a hundred of projects already mostly with
a positive feedback (see the improveit-dashboard note).
CI workflow has
permissionsset only toreadso also should be safe.Changes
Configuration & Infrastructure
[tool.codespell]configuration topyproject.toml..github/workflows/codespell.yml)to check spelling on push and PRs to
master.codespellhook to.pre-commit-config.yaml(with
tomlifallback for Python <3.11).third-party / auto-generated content:
fuji_server/data/*.yaml,fuji_server/data/*.json,fuji_server/data/linked_vocabs/— external identifier andontology data (identifiers.org, bioregistry, bioportal, etc.).
tests/*/cassettes/— VCR recordings.simpleclient/— auto-generated PHP client.*.ipynb— base64-encoded images cause false positives.https?://\S+) so DOIs and linksare never "fixed" (e.g.
doi.org/.../j.patter.2021.100370).Domain-Specific Whitelist (
ignore-words-list)connexion— Python library name (https://github.com/spec-first/connexion).lod— Linked Open Data.ore— Object Reuse and Exchange (OAI-ORE), used throughout fuji_server.Typo Fixes
Ambiguous typos fixed manually (7 fixes with context review,
fb5135f):
Selectin→Selecting(metrics_configuration.mdheading)shat→that(docs/source/conf.pycomment)stati→statuses(metadata_harvester.pycomment)doesnt→doesn't(metadata_harvester.pycomment)"fo rthe"→"for the"(misplaced space inbody.py,openapi.yaml)taked→tagged(tests/helper/test_preprocessor.py— not incodespell's dictionary; actual intended word is "tagged")
identifer→identifiers(fuji_server/data/README.md—refers to the identifiers.org service)
Non-ambiguous typos fixed automatically via
datalad run codespell -w(single-suggestion fixes,5f43837). Common fixes include:
explicitely→explicitly,ressources→resources,Sucessfully/sucessfully/Succesfully→Successfully/successfully,inaccesible→inaccessible,peristent→persistent,accesibility→accessibility,folows→follows,seperate→separate,souce→source,paramter→parameter,Returm→Return,metadat/matadata→metadata,variuos→various,namepace→namespace,exent→extent,opional→optional,reposiroty→repository,occured→occurred,insufficent→insufficient,publically→publicly,stanard→standard,identied→identified.Potential functional fix
fuji_server/models/core_metadata_output.py:79contained anallowed_valuesvalidator list that used"insufficent metadata"(typo),while the producing evaluator
(
fair_evaluator_minimal_metadata.py:219) already uses"insufficient metadata"(correct). This means the validator wouldhave raised
ValueErrorfor the normal code path. Fix propagates thecorrect spelling to the enum here and in
openapi.yaml.Historical Context
Master has ~21 prior commits mentioning typo / spelling fixes,
confirming the value of automated spell-checking going forward.
Testing
🤖 Generated with Claude Code and love to typos free code