Fix language codes not recognized correctly#453
Open
Aunirbhan wants to merge 1 commit intoopenzim:mainfrom
Open
Fix language codes not recognized correctly#453Aunirbhan wants to merge 1 commit intoopenzim:mainfrom
Aunirbhan wants to merge 1 commit intoopenzim:mainfrom
Conversation
benoit74
requested changes
Mar 31, 2026
Collaborator
benoit74
left a comment
There was a problem hiding this comment.
Thank you. I feel like we are on the right path.
What needs to be enhanced:
- I prefer former logic where we build the
language_countsin a single line - we can probably write the same single line to detect any unresolved code (something like
unresolved = [lang for lang in languages if not ZIM_LANGUAGES_MAP.get(lang, [ISO_MATRIX.get(lang, None)])](to be updated withISO_MATRIX_REVas well) - we should fail the scraper immediately when one language is unresolved, not issue a warning, this needs to be fixed rather than silently creating ZIMs with incorrect metadata
- we can probably extract the "complex" logic to get zim_langs from a single language in a dedicated function
- we need to add tests to this
- we need a CHANGELOG entry
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
get_zim_language_metadata()misses ISO 639-3 codes likegla,nap, andojibecause it only looks up 2-letter keys inISO_MATRIX. They resolve toNone, producing an empty language list, and the ZIM crashes.Solution
--zim-languageswhen nothing resolves.ISO_MATRIX_REVfallback so codes already known as values (likegd→gla,oj→oji) resolve correctly.naptoZIM_LANGUAGES_MAPsince you mentioned it on the issue thread.Before / After
Before —
get_zim_language_metadata()on upstream:After:
Passes
hatch run test:runCloses #337