Add support for dask distributed scheduler in quantum detector reader#267
Conversation
|
Thank you @CSSFrancis, as you suggested in #266 (comment), #11 was useful to figure out the details with the structured dtype - pretty much copy & paste the relevant bits! 😄 |
CSSFrancis
left a comment
There was a problem hiding this comment.
This is actually a much smaller change than I thought it would be :). It looks good to me other than it might be nice to return the header as well although a warning that this could increase the loading time would be useful.
| data = data.rechunk(chunks) | ||
|
|
||
| if return_headers: | ||
| if distributed: |
There was a problem hiding this comment.
You can still return the header by just setting the key="header" for a second memmap_distributed call. It will add some time onto the saving of the dataset as the entire dataset might get loaded into ram with most of it thrown away.
Really what we should do is add things to a to_store context manager and then call:
rosettasciio/rsciio/hspy/_api.py
Line 111 in 31bd677
Only once. That will merge taskgraphs as necessary and might reduce the time for saving certain signals. I've thought about it for things like saving lazy markers of possibly creating a hs.save() function for handling mulitple signals if you wanted to save multiple parts of some anaylsis efficently. This is a fairly abstract/higher level concept so maybe it would be seledomly used.
There was a problem hiding this comment.
Yes, this will most likely needed to be done at some point! I opened #269 to track it / add more usecases.
Progress of the PR
.mrcusing Distributed and with Direct Electron Metadata #162 and Add .seq format for DE 16 and Celeritas Camera #11,upcoming_changesfolder (seeupcoming_changes/README.rst),docs/readthedocs.org:rosettasciiobuild of this PR (link in github checks)