Skip to content

feat: rectilinear chunks in Zarr backend#11279

Open
maxrjones wants to merge 17 commits intopydata:mainfrom
maxrjones:poc/unified-zarr-chunk-grid
Open

feat: rectilinear chunks in Zarr backend#11279
maxrjones wants to merge 17 commits intopydata:mainfrom
maxrjones:poc/unified-zarr-chunk-grid

Conversation

@maxrjones
Copy link
Copy Markdown
Contributor

Description

This PR accompanies zarr-developers/zarr-python#3802, adding support for rectilinear zarr chunks in Xarray.

The user-facing difference between this PR and zarr-developers/zarr-python#3369 / #10880 is that rectilinear chunks are gated behind zarr.config.set({'array.rectilinear_chunks': True}) (or ZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives zarr-python developers an opportunity to gracefully finalize the API, which is especially valuable given that rectilinear chunks are the largest feature addition in zarr-python since Zarr V3/sharding.

What changed

  • _determine_zarr_chunks now passes through variable (non-uniform) chunk sizes when writing to Zarr V3 with the unified ChunkGrid API, instead of raising an error.
  • Reading correctly reconstructs chunk information from both RegularChunkGrid and RectilinearChunkGrid metadata.
  • safe_chunks and align_chunks validation is skipped for rectilinear (tuple-of-tuples) chunks, since those checks assume uniform chunk sizes.
  • Error messages for chunk validation failures now distinguish between Zarr V2 and V3 and point users toward the rectilinear chunks extension.

To-do

  • expand test coverage for error messages when using V2 or config flag is off, and a multi-dimensional test case
  • decide whether to continue silently bypassing safe_chunks/align_chunks or add validations
  • remove upstream version pin

Checklist

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR. Tools: Claude Code

@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library io labels Apr 2, 2026
@headtr1ck
Copy link
Copy Markdown
Collaborator

Is this a duplicate of #10880?

@maxrjones
Copy link
Copy Markdown
Contributor Author

Is this a duplicate of #10880?

This would supersede #10880. It implements the same feature, but using a different upstream implementation (zarr-developers/zarr-python#3802), which will likely be merged into Zarr-Python in the coming days. zarr-developers/zarr-python#3802 supersedes zarr-developers/zarr-python#3369, which #10880 was built on top.

Copy link
Copy Markdown
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to look into this a bit more, but for now:

is skipped for rectilinear (tuple-of-tuples) chunks since those checks assume uniform chunk sizes.

That's what the current checks do, but their purpose is to support safely appending data without write conflicts between execution workers (dask / cubed / etc). Do we maybe need different checks that verify that zarr chunks do not overlap with multiple execution chunks?

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
pixi.toml Outdated
dask = { git = "https://github.com/dask/dask" }
distributed = { git = "https://github.com/dask/distributed" }
zarr = { git = "https://github.com/zarr-developers/zarr-python" }
zarr = { git = "https://github.com/maxrjones/zarr-python", branch = "poc/unified-chunk-grid" }
Copy link
Copy Markdown
Contributor

@dcherian dcherian Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zarr = { git = "https://github.com/maxrjones/zarr-python", branch = "poc/unified-chunk-grid" }

Now that it's on main, we can apply the run-upstream label (which i will do now)

@dcherian dcherian added the run-upstream Run upstream CI label Apr 8, 2026
eendebakpt and others added 7 commits April 8, 2026 17:31
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
* add `zizmor` to the hooks

* set the default permissions to minimum

* don't persist credentials

* pin `actions/checkout`

* pin `xarray-contrib/ci-trigger`

* pin `actions/upload-artifact`

* pin `actions/download-artifact`

* pin `pypa/gh-action-pypi-publish`

* pin `actions/setup-python`

* pin `prefix-dev/setup-pixi`

* pin `codecov/codecov-action`

* pin `scientific-python/issue-from-pytest-log-action`

* pin `mamba-org/setup-micromamba`

* pin `WyriHaximus/github-action-get-previous-tag`

* pin `EnricoMi/publish-unit-test-result-action`

* pin `actions/labeler`

* pin `actions/cache`

* actions cooldown for dependabot

* avoid potential template injections

* broken condition

* ignore the `pull_request_target` warning

(because `actions/labeler` actually needs it)

* ignore zizmor's dangerous-triggers warning for publish-test-results

* fetch the `codecov` token from a github environment

* correct the pin for `setup-pixi`

* split the nightly wheels ci into build and publish jobs

* remove the codecov env and ignore the zizmor warning instead

* back to the codecov env, but disable deployments

* correct the pin for `actions/setup-python`

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>

---------

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>
This PR modifies the few places we relied on a generic `np.timedelta64` dtype to explicitly specify the time resolution:
- It removes `NAT_TYPES` and relies instead on checking the `dtype.kind` in `computation.nanops._maybe_null_out`.
- It infers the time `unit` using `np.datetime_data` from the input `dtype` to determine the `unit` on the returned `fill_value` in `core.dtypes.maybe_promote`.
- It explicitly constructs a zero-valued `np.timedelta64` or `np.datetime64` object for use downstream in `plot.utils._determine_cmap_params`.
Bumps the actions group with 2 updates: [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) and [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish).


Updates `prefix-dev/setup-pixi` from 0.9.4 to 0.9.5
- [Release notes](https://github.com/prefix-dev/setup-pixi/releases)
- [Commits](prefix-dev/setup-pixi@a0af7a2...1b2de7f)

Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases)
- [Commits](pypa/gh-action-pypi-publish@ed0c539...cef2210)

---
updated-dependencies:
- dependency-name: prefix-dev/setup-pixi
  dependency-version: 0.9.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: pypa/gh-action-pypi-publish
  dependency-version: 1.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…a#11282)

* remove the accidentally copied pypi publish step

* separate test execution from issue creation

* upload the artifact only if the tests failed

* correct source of the log-file path

* debug: print github output [skip-rtd]

* typo [skip-rtd]

* correct the path to the log file

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>

---------

Co-authored-by: Nick Hodgskin <36369090+VeckoTheGecko@users.noreply.github.com>
* use var._root._h5py to get h5py module in h5netcdf backend instead of importing it
* fix ros3 tests to use DANDI endpoint and include hdf5 version switch
* fix phony_dims for ros3 test
* fix import check for ros3 availability
* try using property to get around import issue
* add whats-new.rst entry
@github-actions github-actions bot added topic-indexing topic-plotting Automation Github bots, testing workflows, release automation labels Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Automation Github bots, testing workflows, release automation io run-upstream Run upstream CI topic-backends topic-indexing topic-plotting topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants