All notable changes to this project are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning (as of version 1.5.0).
zimwriterfsscript that mimics zim-tools's zimwriterfs
- Simplify type annotations by replacing Union and Optional with pipe character ("|") for improved readability and clarity
- Add support for
disable_metadata_checksandignore_duplicatesarguments inmake_zim_filefunction ("zimwritefs-mode")
- Relaxed constraints on Python dependencies
- Upgraded optional dependencies used for test and QA
- Set a user-agent for
handle_user_provided_file#103
- Migrate to generic syntax in all std collections #140
- Do not modify the ffmpeg_args in reencode function #144
- New
disable_metadata_checksparameter inzimscraperlib.zim.creator.Creatorinitializer, allowing to disable metadata check at startup (assuming the user will validate them on its own) #119
- Rework the VideoWebmLow preset for faster encoding and smaller file size #122
- preset has been bumped to version 2
- when using an S3 cache, all videos using this preset will be reencoded and uploaded to cache again (it will replace the same file encoded with preset version 1)
- When reencoding a video, ffmpeg now uses only 1 CPU thread by default (new arg to
reencodeallows to override this default value) - Using openZIM Python bootstrap conventions (including hatch-openzim plugin) #120
- Add support for Python 3.12, drop Python 3.7 support #118
- Replace "iso-369" by "iso639-lang" library
- Replace "file-magic" by "python-magic" library for Alpine Linux support and better maintenance
- Fixed type hints of
zimscraperlib.zim.Itemand subclasses, andzimscraperlib.image.optimization:convert_image
- Add utility function to compute/check ZIM descriptions #110
- Using pylibzim
3.4.0
- Support for Python 3.7 (EOL)
- Fixed declared (hint) return type of
download.stream_file#104 - Fixed declared (hint) type of
contentparam forCreator.add_item_for#107
- Using pylibzim
3.1.0 - ZIM metadata check now allows multiple values (comma-separated) for
Language - Using
yt_dlpinstead ofyoutube_dl
- Dropped support for Python 3.6
zim.creator.Creator and zim.filesystem.make_zim_file
zim.creator.Creator.config_metadatamethod (returning Self) exposing all mandatory Metdata, all standard ones and allowing extra text metdadata.zim.creator.Creator.config_dev_metadatamethod setting stub metdata for all mandatory ones (allowing overrides)zim.metadatamodule with a list of per-metadata validation functionszim.creator.Creator.validate_metadata(called onstart) to verify metadata respects the spec (and its recommendations)zim.filesystem.make_zim_fileaccepts a new optionallong_descriptionparam.i18n.is_valid_iso_639_3to check ISO-639-3 codesimage.probing.is_valid_imageto check Image format and size
zim.creator.Creatormain_pathargument now mandatoryzim.creator.Creator.startnow fails on missing required or invalid metadatazim.creator.Creator.add_metadatanows enforces validation checkszim.filesystem.make_zim_filerenamed itsfavicon_pathparam toillustration_pathzim.creator.Creator.config_indexinglanguageargument now optionnal whenindexing=Falsezim.creator.Creator.config_indexingnow validateslanguageis ISO- 639-3 whenindexing=True
zim.creator.Creator.update_metadata. See.config_metadata()insteadzim.creator.Creatorlanguageargument. See.config_metadata()insteadzim.creator.Creatorkeyword arguments. See.config_metadata()insteadzim.creator.Creator.add_default_illustration. See.config_metadata()insteadzim.archibe.Archive.media_counter(deprecated in2.0.0)
zim.creator.Creator(language=)can be specified asList[str].["eng", "fra"],["eng"],"eng,fra", "eng" are all valid values.
- Fixed
zim.providers.URLProviderreturning incomplete streams under certain circumstances (from openzim/kolibri#40) - Fixed
zim.creator.Creatornot supporting multiple values in for Language metadata, as required by the spec
- Using pylibzim v2.1.0 (using libzim 8.1.0)
- [libzim]
Entry.get_redirect_entry() - [libzim]
Item.get_indexdata()to implement custom IndexData per entry (writer) - [libzim]
Archive.media_count
- [libzim]
Archive.article_countupdated to match scraperlib's version Archive.article_counternow deprecated. Now returnsArchive.article_countArchive.media_counternow deprecated. Now returnsArchive.media_count
- [libzim]
lzmacompression algorithm
download.get_session()to build a new requests Session
download.stream_file()accepts asessionparam to use instead of creating one
zim.Creatornow supportsignore_duplicates: boolparameter to prevent duplicates from raising exceptionszim.Creator.add_item,zim.Creator.add_redirectandzim.Creator.add_item_fornow supports aduplicate_ok: boolparameter to prevent an exception should this item/redirect be a duplicate
download.stream_file()supports passingheaders(scrapers were already using it)
- Fixed
filesystem.get_content_mimetype()crashing on non-guessable byte stream
- Wider range of accepted lxml dependency version as 4.9.1 fixes a security issue
Archive.get_metadata_item()to retrieve full item instead of just value
- Using pylibzim v1.1.0 (using libzim 7.2.1)
- Adding duplicate entries now raises RuntimeError
- filesize is fixed for larger ZIMs
zim.Archive.tagsandzim.Archive.get_tags()to retrieve parsed Tags with optionnallibkiwixparam to include libkiwix's hints- [tests] Counter tests now also uses a libzim6 file.
zim.Archive.article_counterfollows libkiwix's new bahavior of returning libzim'sarticle_countfor libzim 7+ ZIMs and returning previously returned (parsed) value for older ZIMs.
- Unreachable code removed in
imagingmodule. - [tests] “Sanskrit” removed from tests as output not predicatble depending on plaftform.
zim.Archive.counterswont fail on missingCountermetadata
- Fixed leak in
zim.Archive's.counters - New
.get_text_metadata()method onzim.Archiveto save UTF-8 decoding
- New
Countermetadata based properties for Archive:.counters: parsed dict of the Counter metadata.article_counter: libkiwix's calculation for nb or article.media_counter: libkiwix's calculation for nb or media
- Fixed
i18n.find_language_names()failing on some languages - Added
urimodule withrebuild_uri()
- Using new python-libzim based on libzim v7
- New Creator API
- Removed all namespace references
- Renamed
urlmentions topath - Removed all links rewriting
- Removed Article/CSS/Binary seggreation
- Kept zimwriterfs mode (except it doesn't rewrite for namespaces)
- New
htmlmodule for HTML document manipulations - New callback system on
add_item_for()andadd_item() - New Archive API with easier search/suggestions and content access
- Changed download log level to DEBUG (was INFO)
filesystem.get_file_mimetypenow passes bytes to libmagic instead of filename due to release issue in libmagic- safer
inputs.handle_user_provided_fileregarding input as str instead of Path image.presetsandvideo.presetsnow all includesextandmimetypeproperties- Video convert log now DEBUG instead of INFO
- Fixed
image.save_image()saving to disk even when using a bytes stream - Fixed
image.transformation.resize_image()when resizing a byte stream without a dst
Intermediate release using unreleased libzim to support development of libzim7. Don't use it.
- requesting newer libzim version (not released ATM)
- New ZIM API for non-namespace libzim (v7)
- updated all requirements
- Fixed download test inconsistency
- fix_ogvjs mostly useless: only allows webm types
- exposing retry_adapter for refactoring
- Changed download log level to DEBUG (was INFO)
- guess more-defined mime from filename if magic says it's text
- get_file_mimetype now passes bytes to libmagic
- safer regarding input as str instead of Path
- fixed static item for empty content
- ext and mimetype properties for all presets
- Video convert log now DEBUG instead of INFO
- Added delete_fpath to add_item_for() and fixed StaticItem's auto remove
- Updated badges for new repo name
- add
stream_file()to stream content from a URL into a file or aBytesIOobject - deprecated
save_file() - fixed
add_binarywhen used without an fpath (#69) - deprecated
make_grayscaleoption in image optimization - Added support for in-memory optimization for PNG, JPEG, and WebP images
- allows enabling debug logs via ZIMSCRAPERLIB_DEBUG environ
- added
waitoption inYoutubeDownloaderto allow parallelism while using context manager - do not use extension for finding format in
ensure_matches()inimage.optimizationmodule - added
VideoWebmHighandVideoMp4Highpresets for high quality WebM and Mp4 convertion respectively - updated presets
WebpHigh,JpegMedium,JpegLowandPngMediuminimage.presets save_imagemoved fromimagetoimage.utils- added
convert_imageoptimize_imageresize_imagefunctions toimagemodule
- added
YoutubeDownloadertodownloadto download YT videos using a capped nb of threads
- fixed rewriting of links with empty target
- added support for image optimization using
zimscraperlib.image.optimizationfor webp, gif, jpeg and png formats - added
format_for()inzimscraperlib.image.probingto get PIL image format from the suffix
- replaced BeautifoulSoup parser in rewriting (
html.parser–>lxml)
- detect mimetypes from filenames for all text files
- fixed non-filename based StaticArticle
- enable rewriting of links in poster attribute of audio element
- added find_language_in() and find_language_in_file() to get language from HTML content and HTML file respectively
- add a mime mapping to deal with inconsistencies in mimetypes detected by magic on different platforms
- convert_image signature changed:
target_formatpositional argument removed. Replaced with optionnalfmtkey of keyword arguments.colorspaceoptionnal positional argument removed. Replaced with optionnalcolorspacekey of keyword arguments.
- prevent rewriting of links with special schemes
mailto, 'tel', etc. in HTML links rewriting - replaced
imagingmodule with explodedimagemodule (convertion,probing,transformation) - changed
create_favicon()param names (source_image->src,dest_ico->dst) - changed
save_image()param names (image->src) - changed
get_colors()param names (image_path->src) - changed
resize_image()param names (fpath->src)
- fixed URL rewriting when running from /
- added support for link rewriting in
<object>element - prevent from raising error if element doesn't have the attribute with url
- use non greedy match for CSS URL links (shortest string matching
url()format) - fix namespace of target only if link doesn't have a netloc
- added UTF8 to constants
- added mime_type discovery via magic (filesystem)
- Added types: mime types guessing from file names
- Revamped zim API
- Removed ZimInfo which role was tu hold metadata for zimwriterfs call
- Removed calling zimwriterfs binary but kept function name
- Added zim.filesystem: zimwriterfs-like creation from a build folder
- Added zim.creator: create files by manually adding each article
- Added zim.rewriting: tools to rewrite links/urls in HTML/CSS
- add timeout and retries to save_file() and make it return headers
- fixed
convert_image()which tried to use a closed file
- exposed reencode, Config and get_media_info in zimscraperlib.video
- added save_image() and convert_image() in zimscraperlib.imaging
- added support for upscaling in resize_image() via allow_upscaling
- resize_image() now supports params given by user and preservs image colorspace
- fixed tests for zimscraperlib.imaging
- added video module with reencode, presets, config builder and video file probing
make_zim_file()accepts extra kwargs for zimwriterfs
- added translation support to i18n
- added s3transfer to verbose dependencies list
- changed default log format to include module name
- verbose dependencies (urllib3, boto3) now logged at WARNING level by default
- ability to set verbose dependencies log level and add modules to the list
- zimscraperlib's logging level now aligned with scraper's requested one
- fix_ogvjs_dist script more generic (#1)
- updated zim to support other zimwriterfs params (#10)
- more flexible requirements for requests dependency
- fixed return value of
get_language_detailson non-existent language - fixed crash on
resize_imagewith methodheight - fixed root logger level (now DEBUG)
- removed useless
console=TruegetLoggerparam - completed tests (100% coverage)
- added
./testscript for quick local testing - improved tox.ini
- added
create_faviconto generate a squared favicon - added
handle_user_provided_fileto handle user file/URL from param
- fixed fix_ogvjs_dist
- initial version providing
- download: save_file, save_large_file
- fix_ogvjs_dist
- i18n: setlocale, get_language_details
- imaging: get_colors, resize_image, is_hex_color
- zim: ZimInfo, make_zim_file