Skip to content

expose some functions under bruker.api#121

Draft
sem-geologist wants to merge 7 commits into
hyperspy:mainfrom
sem-geologist:bruker_api_exp
Draft

expose some functions under bruker.api#121
sem-geologist wants to merge 7 commits into
hyperspy:mainfrom
sem-geologist:bruker_api_exp

Conversation

@sem-geologist

@sem-geologist sem-geologist commented May 19, 2023

Copy link
Copy Markdown
Contributor

Description of the change

This PR aims to modernise and extend the bruker._api.py:

  • expose middle layer functions and classes under bruker.api
  • review, update and extend the docstrings, in particular of these functions and class'es which or which child are
    going to be exposed through .api
  • review the code for possible cleanup, scope-separation and streamlining for public exposition in .api

Progress of the PR

  • expose SfsReader
  • expose xml_to_spectrum
  • expose xml_to_image
  • update docstring (if appropriate),
  • update user guide (if appropriate),
  • add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
  • add tests,
  • ready for review.

???
are xml_to_spectrum and xml_to_image good naming for these functions. functions require particular etree nodes - maybe et_node_to_spectrum and et_node_to_image would be better naming?

Minimal example of the bug fix or the new feature

from sciio.bruker import api as b_api

b_api.SFSReader('somefile.pan')  # *.pan are particle analysis files using same container as bcf 

@sem-geologist sem-geologist marked this pull request as draft May 19, 2023 14:51
@codecov

codecov Bot commented May 19, 2023

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.62921% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.29%. Comparing base (5f9e746) to head (a57fd72).
⚠️ Report is 1229 commits behind head on main.

Files with missing lines Patch % Lines
rsciio/bruker/_api.py 96.59% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
+ Coverage   85.16%   85.29%   +0.13%     
==========================================
  Files          73       74       +1     
  Lines        9030     9042      +12     
  Branches     1932     2045     +113     
==========================================
+ Hits         7690     7712      +22     
+ Misses        873      870       -3     
+ Partials      467      460       -7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread rsciio/bruker/_api.py Outdated
Comment thread rsciio/bruker/_api.py Outdated
@sem-geologist

sem-geologist commented May 29, 2023

Copy link
Copy Markdown
Contributor Author

@ericpre , @jlaehne I have some question. As I am doing pretty extensive review of bruker _api.py, so that it could be more streamlined, I came again over idea how to handle images. Bruker xml image node can have plains (that is called "plains" within the XML), it is basically different video channels, which register signal at same time/simultaniously when beam is rastered over sample ROI (In some sense it is very similar to spectroscopy to be honest). So basically most of metadata is the same (actually in XML it is physically shared) except description string of the every channel. Hitherto in this case all metadata was being duplicated, and every plain was returned as separate independent set with its data and metadata. But... is it the right and most efficient way? would it not be better way to stack those plains, alongside axis called as "channels"? can axes have labeled scale (that is instead of numbers it would be channel description)? Does this makes any sense at all? I mean for me it does a lot, as those plains are different representations of exactly same material interaction or result of beam: i.e. say as example 4 channels: BF, DF, DF4, HAADF. They all represent different interacted electrons with matter.
Even if taking 2 channel (most common setup with SEM having Bruker EDS) of BSE and SE images, those represents different energy electrons, and in some sense combining it forms kind of spectral image.

Such change probably is going break some established workflows. But I think it would be very right thing to do.

@jlaehne

jlaehne commented May 29, 2023

Copy link
Copy Markdown
Member

We had a similar topic brought up for LumiSpy (CL spectral image + corresponding SE image) by @jordiferrero, which was never implemented so far: LumiSpy/lumispy#73
Indeed, it would be nice to have an upstream solution for that in HyperSpy!

Concerning the 'labeled' axes for stacked data, @CSSFrancis was working on a PR.

@CSSFrancis

Copy link
Copy Markdown
Member

So adding labeled axes wouldn't be terribly hard in hyperspy. I have done most of the work in hyperspy/hyperspy#3031 Right now there is a little bit of hesitation with this change but I think that there isn't really anything stopping this from happening. This might be something worth voting on or having a larger discussion about how these signals are handled.

Handling multiple signals is a little bit tricky and becomes more tricky if they have different numbers of pixels, but even that can be handled by hyperspy.

There are a couple of things:

  1. If all of the signals have the same dimensionality and same size --> pass as labeled axes and create a stacked array
  2. If all of the signals have the same dimensionality and different --> pass as labeled axes and create a stacked ragged array
  3. If one of the signals has a higher dimensionality (i.e. HAADF and 4-D STEM) pass one as navigator and one as array

For 3 I also think that in the case where you have multiple different signals you should be able to pass all of them. In that case you could have multiple different navigators. For example with a 4-D STEM you could have brightfield and darkfield images.

@sem-geologist

sem-geologist commented May 29, 2023

Copy link
Copy Markdown
Contributor Author

thanks @CSSFrancis , Your 3 point division makes it pretty clear. So "planes" used in bruker XML image actually not only fulfills point 1. of same dimentiality and resolution/size, but are even more closely related, as it have exactly same column conditions, and is pixel-to-pixel generated with exactly same beam simultaneously - thus even more - it should be stacked. Hopefully Your PR hyperspy/hyperspy#3031 will get accepted. I probably will stack images with unitless axes as for now.

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

@CSSFrancis

Copy link
Copy Markdown
Member

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

In these types of situations "what is good for the goose is good for the gander" or both software packages will probably benefit from a consistent approach to these low level problems :) It also helps with interoperability which can only be a good thing.

To be honest the case #2 above is quite difficult from the perspective of hyperspy. There is nothing that says hyperspy cannot have a signal of signals similar to how numpy can have a ragged array of arrays. I considered something like that here but I am not sure if there is a consistent way to approach that. For now there hasn't really been a real need for it so I haven't really paid it much attention.

@ericpre

ericpre commented May 30, 2023

Copy link
Copy Markdown
Member

Just for me to make sure that I don't misunderstand, @sem-geologist what you are asking is to have a signal, which contains BSE and SE images stacked together and another one which contains the EBSD or EDS data?
Could it be easily stack after loading the data by the library itself, at a stage, where it is known how the data needs to be structure/handle. I am not sure that this is the responsibility of RosettaSciIO to define this?

Currently, in hyperspy, these are considered as treated as different dataset and they expected to be handle at the workflow level (script, notebook, library etc.). Maybe what RosettaSciIO could do is to assign some dataset to a specify category: "main", "auxiliary" (data acquired simultaneously but not "main", or already processed data, i.e. map from a spectrum image), "survey", etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants