Skip to content

Add support for dask distributed scheduler in quantum detector reader#267

Merged
ericpre merged 2 commits into
hyperspy:mainfrom
ericpre:mib_distributed
Jun 5, 2024
Merged

Add support for dask distributed scheduler in quantum detector reader#267
ericpre merged 2 commits into
hyperspy:mainfrom
ericpre:mib_distributed

Conversation

@ericpre

@ericpre ericpre commented May 29, 2024

Copy link
Copy Markdown
Member

Progress of the PR

Comment thread rsciio/utils/distributed.py Fixed
@ericpre

ericpre commented May 29, 2024

Copy link
Copy Markdown
Member Author

Thank you @CSSFrancis, as you suggested in #266 (comment), #11 was useful to figure out the details with the structured dtype - pretty much copy & paste the relevant bits! 😄

@CSSFrancis CSSFrancis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a much smaller change than I thought it would be :). It looks good to me other than it might be nice to return the header as well although a warning that this could increase the loading time would be useful.

data = data.rechunk(chunks)

if return_headers:
if distributed:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still return the header by just setting the key="header" for a second memmap_distributed call. It will add some time onto the saving of the dataset as the entire dataset might get loaded into ram with most of it thrown away.

Really what we should do is add things to a to_store context manager and then call:

da.store(data, dset)

Only once. That will merge taskgraphs as necessary and might reduce the time for saving certain signals. I've thought about it for things like saving lazy markers of possibly creating a hs.save() function for handling mulitple signals if you wanted to save multiple parts of some anaylsis efficently. This is a fairly abstract/higher level concept so maybe it would be seledomly used.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will most likely needed to be done at some point! I opened #269 to track it / add more usecases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants