Add OMEZarrWSI slide reader#195
Conversation
Change the ome-zarr dependencies to hopefully work better with the strict package requirements of aicsimageio.
…n test_openslidewsi.py
didn't change index.rst as the support is probably more narrow than any conception of a WSI in the zarr format.
|
Hi @HarveySouth, thanks for the PR. OME Zarr is pretty rare for H&E/IHC. To approve it, i'd need a couple of things:
Let me know, thanks |
|
Thanks for your review, and it will be great to have this approved. I fixed the 3.10 compatibility issues and moved the dependencies to be optional. In trying to find an OME Zarr image to share, I realise that the OME Zarr multiscales specification states that dimensions SHOULD order tczyx where I've assumed a strict ordering on the dimensions, so I have to fix that and I'll have a look sometime soon. Thanks! |
|
Sounds good @HarveySouth, let me know once you're done with your changes, and i'll test the pipeline on my end. thanks |
|
Seeing the tests I wonder how best to make a reader optional. Should I move the imports into _lazy_initialize() with try:
import ...
except ImportErrorI see there's some documentation I should update for installing the optional group of packages too. |
Fix assumption that ngff-multiscales have a strict dimension ordering Format code with black and autopep8 Move imports that are now optional to a try except block Add docstring to _fetch_downsamples Optimise read_region by adding some processing to lazy_initialize
Make imports more robust Expand possible dimnames Fix incorrect OME spec assumption in fetch_mpp Update slice varname for better readability Cache slice involved variables in init rather than read
|
I still have to make a few checks and I'll let you know when I'm done reviewing. You can get a public tiff WSI and convert to an OME Zarr image with the following: import zarr
from pathlib import Path
import hashlib
from ngff_zarr import tiff_file_to_ngff_images, to_multiscales, to_ngff_zarr
import openslide
import requests
# from the files section of https://gigadb.org/dataset/100439
url = "https://s3.ap-northeast-1.wasabisys.com/gigadb-datasets/live/pub/10.5524/100001_101000/100439/CAMELYON16/training/normal/normal_108.tif"
tiff_save_location = Path("./normal_108.tif").resolve()
zarr_save_location = Path("./normal_108.ome.zarr").resolve()
print("Making web request for gigadb file")
with requests.get(url, stream=True) as r:
with open(tiff_save_location, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
md5check = "3714c9bb0d83aa0702aa48526dc340c1"
print("checking hash")
md5result = hashlib.file_digest(open(tiff_save_location,'rb'), "md5").hexdigest()
assert md5result == md5check, "Err, downloaded file failed verification"
print("saving tiff as omezarr")
_, ngff_image = tiff_file_to_ngff_images(tiff_save_location)[0] # index 0 for the actual TIFF WSI has series length 1
multiscales = to_multiscales(ngff_image) # generate the multiscales
to_ngff_zarr(zarr_save_location, multiscales) # save the converted data
print("updating omezarr metadata to have units and correct scale")
openimg_xmpp = openslide.OpenSlide(tiff_save_location).properties['openslide.mpp-x']
multiscales_zgroup = zarr.open_group(zarr_save_location, mode='r+')
multi_attrs = dict(multiscales_zgroup.attrs)
# set units (using knowledge that this file is orderd yxc :( )
axes = multi_attrs['multiscales'][0]['axes']
axes[0]['unit'] = "micrometers"
axes[1]['unit'] = "micrometers"
# update scale accordingly
topimage = multi_attrs['multiscales'][0]['datasets'][0]
topimage['coordinateTransformations'][0]['scale'][0] = float(openimg_xmpp)
topimage['coordinateTransformations'][0]['scale'][1] = float(openimg_xmpp)
for i, scaled_imagemeta in enumerate(multi_attrs['multiscales'][0]['datasets'][1:]):
scaled_imagemeta['coordinateTransformations'][0]['scale'][0] *= float(multi_attrs['multiscales'][0]['datasets'][(i)]['coordinateTransformations'][0]['scale'][0])
scaled_imagemeta['coordinateTransformations'][0]['scale'][1] *= float(multi_attrs['multiscales'][0]['datasets'][(i)]['coordinateTransformations'][0]['scale'][1])
# write the update
multiscales_zgroup.attrs.put(multi_attrs)
print("Done")I need some of the imports to scope into functions outside of lazy_initialize so my imports are tried in a try except block at the top of the file instead. |
|
I'm done with my changes @guillaumejaume |
|
I synced my fork with the recent changes without conflict, is there anything else you need from me for approval? |
|
Thanks @HarveySouth, I'll look into it next week |
|
See #200 |
I added support for locally stored OMEZarr Multiscale specification WSIs. I made changes based on #163 , then changes based on a look at the references to openslide in the repo, then changes based on my testing of the OMEZarr reader. I chose to use ngff-zarr to read OMEZarr Multiscales files, and to recognize files based on
OMEZARR_EXTENSIONS = {'.zarr'}as the code for WSI extension recognition expects a single file suffix. I didn't change index.rst to describe the added support because I think the scope of the added support is limited.Testing I've done:
I haven't tried: files from a different storage backend, or files that have been created from a different OME Zarr specification implementation (e.g. do files from bioformats2raw work?).
Changes I thought would be nice for the whole repo that aren't in the scope of this PR:
if x_resolution and unit:in OpenSlideWSI or the conversion in _save_pyvips_tiff in Converter.py. The OME Zarr axis specification takes any UDUNITS-2 unit, so I think cfunits is a reasonable solution to unit conversion