Skip to content

Add OMEZarrWSI slide reader#195

Closed
HarveySouth wants to merge 17 commits into
mahmoodlab:mainfrom
HarveySouth:omezarr-reader
Closed

Add OMEZarrWSI slide reader#195
HarveySouth wants to merge 17 commits into
mahmoodlab:mainfrom
HarveySouth:omezarr-reader

Conversation

@HarveySouth

@HarveySouth HarveySouth commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

I added support for locally stored OMEZarr Multiscale specification WSIs. I made changes based on #163 , then changes based on a look at the references to openslide in the repo, then changes based on my testing of the OMEZarr reader. I chose to use ngff-zarr to read OMEZarr Multiscales files, and to recognize files based on OMEZARR_EXTENSIONS = {'.zarr'} as the code for WSI extension recognition expects a single file suffix. I didn't change index.rst to describe the added support because I think the scope of the added support is limited.

Testing I've done:

  • I used the same functions in test_openslidewsi.py to see that the outputs followed the same assertions and outputs looked reasonable.
  • I ran segmentation, patching, and feature extraction on ~1000 of my own LocalStore zarr files.

I haven't tried: files from a different storage backend, or files that have been created from a different OME Zarr specification implementation (e.g. do files from bioformats2raw work?).

Changes I thought would be nice for the whole repo that aren't in the scope of this PR:

  • Using pathlib rather than os for filesystem operations. I think some of the string surgery on paths with os can be improved with the pathlib API.
  • Using cfunits for unit conversions e.g. to replace if x_resolution and unit: in OpenSlideWSI or the conversion in _save_pyvips_tiff in Converter.py‎. The OME Zarr axis specification takes any UDUNITS-2 unit, so I think cfunits is a reasonable solution to unit conversion

@guillaumejaume

Copy link
Copy Markdown
Contributor

Hi @HarveySouth, thanks for the PR. OME Zarr is pretty rare for H&E/IHC.

To approve it, i'd need a couple of things:

  • could you share an actual OME Zarr image that i could use for testing
  • it seems that there is a compatibility issue with python 3.10, which we still need to support
  • cause it's a niche format, we shouldn't force installing zarr packages by default and should be optional (similar to the converter or patch_encoder optional installs)

Let me know, thanks
Guillaume

@HarveySouth

Copy link
Copy Markdown
Contributor Author

Thanks for your review, and it will be great to have this approved. I fixed the 3.10 compatibility issues and moved the dependencies to be optional.

In trying to find an OME Zarr image to share, I realise that the OME Zarr multiscales specification states that dimensions SHOULD order tczyx where I've assumed a strict ordering on the dimensions, so I have to fix that and I'll have a look sometime soon.

Thanks!

@guillaumejaume

Copy link
Copy Markdown
Contributor

Sounds good @HarveySouth, let me know once you're done with your changes, and i'll test the pipeline on my end. thanks

@HarveySouth

HarveySouth commented Mar 18, 2026

Copy link
Copy Markdown
Contributor Author

Seeing the tests I wonder how best to make a reader optional. Should I move the imports into _lazy_initialize() with

try:
    import ...
except ImportError

I see there's some documentation I should update for installing the optional group of packages too.

Fix assumption that ngff-multiscales have a strict dimension ordering
Format code with black and autopep8
Move imports that are now optional to a try except block
Add docstring to _fetch_downsamples
Optimise read_region by adding some processing to lazy_initialize
Make imports more robust
Expand possible dimnames
Fix incorrect OME spec assumption in fetch_mpp
Update slice varname for better readability
Cache slice involved variables in init rather than read
@HarveySouth

HarveySouth commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

I still have to make a few checks and I'll let you know when I'm done reviewing. You can get a public tiff WSI and convert to an OME Zarr image with the following:

import zarr
from pathlib import Path
import hashlib
from ngff_zarr import tiff_file_to_ngff_images, to_multiscales, to_ngff_zarr
import openslide
import requests

# from the files section of https://gigadb.org/dataset/100439
url = "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/100001_101000/100439/CAMELYON16/training/normal/normal_108.tif"

tiff_save_location = Path("./normal_108.tif").resolve()
zarr_save_location = Path("./normal_108.ome.zarr").resolve()

print("Making web request for gigadb file")
with requests.get(url, stream=True) as r:
    with open(tiff_save_location, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192): 
            f.write(chunk)

md5check = "3714c9bb0d83aa0702aa48526dc340c1"

print("checking hash")
md5result = hashlib.file_digest(open(tiff_save_location,'rb'), "md5").hexdigest()
assert md5result == md5check, "Err, downloaded file failed verification"

print("saving tiff as omezarr")
_, ngff_image = tiff_file_to_ngff_images(tiff_save_location)[0] # index 0 for the actual TIFF WSI has series length 1
multiscales = to_multiscales(ngff_image) # generate the multiscales
to_ngff_zarr(zarr_save_location, multiscales) # save the converted data

print("updating omezarr metadata to have units and correct scale")
openimg_xmpp = openslide.OpenSlide(tiff_save_location).properties['openslide.mpp-x']

multiscales_zgroup = zarr.open_group(zarr_save_location, mode='r+')
multi_attrs = dict(multiscales_zgroup.attrs)

# set units (using knowledge that this file is orderd yxc :( )
axes = multi_attrs['multiscales'][0]['axes']
axes[0]['unit'] = "micrometers"
axes[1]['unit'] = "micrometers"
# update scale accordingly
topimage = multi_attrs['multiscales'][0]['datasets'][0]
topimage['coordinateTransformations'][0]['scale'][0] = float(openimg_xmpp)
topimage['coordinateTransformations'][0]['scale'][1] = float(openimg_xmpp)

for i, scaled_imagemeta in enumerate(multi_attrs['multiscales'][0]['datasets'][1:]):
    scaled_imagemeta['coordinateTransformations'][0]['scale'][0] *= float(multi_attrs['multiscales'][0]['datasets'][(i)]['coordinateTransformations'][0]['scale'][0])
    scaled_imagemeta['coordinateTransformations'][0]['scale'][1] *= float(multi_attrs['multiscales'][0]['datasets'][(i)]['coordinateTransformations'][0]['scale'][1])

# write the update
multiscales_zgroup.attrs.put(multi_attrs)
print("Done")

I need some of the imports to scope into functions outside of lazy_initialize so my imports are tried in a try except block at the top of the file instead.

@HarveySouth

Copy link
Copy Markdown
Contributor Author

I'm done with my changes @guillaumejaume

@HarveySouth

Copy link
Copy Markdown
Contributor Author

I synced my fork with the recent changes without conflict, is there anything else you need from me for approval?

@guillaumejaume

Copy link
Copy Markdown
Contributor

Thanks @HarveySouth, I'll look into it next week

@guillaumejaume

Copy link
Copy Markdown
Contributor

See #200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants