Pseudodata in diagonal basis and saving rotation#2455
Conversation
7cae6d8 to
e05e7ed
Compare
achiefa
left a comment
There was a problem hiding this comment.
I'll have a closer look tomorrow morning. For the moment, this is the only thing that confuses me. But I'm surely missing something.
|
So just to make the point here and remind the myself of the future. @jacoterh has successfully implemented the possiblity of storing the eigenvectors and relative eigenvalues in vp-setupfit for reproducibility. This required us to default There is one open problem with the current branch, namely that vp-setupfit sets Am I missing something @jacoterh ? |
|
I guess by validphys you mean vp-setupfit. For
Indeed, I think that, given that now vp-setupfit stores the diagonalization, it can be treated as the theory covmat and n3fit should read it instead of recalculating it. |
So do we want to implement this now, or shall we keep it for a new PR? |
|
Yes, let's do this here - I was already working on it on Friday but was encountering some issues. I continue with it this afternoon so we can hopefully merge this PR by tomorrow at the CM. |
| @@ -484,6 +484,7 @@ def dataset_inputs_t0_total_covmat(dataset_inputs_t0_exp_covmat, loaded_theory_c | |||
| """ | |||
| covmat = dataset_inputs_t0_exp_covmat | |||
| covmat += loaded_theory_covmat | |||
There was a problem hiding this comment.
@scarlehoff Is there a way to have vp-setupfit write the theory covmat csv files to the table directory first before attempting to load them? We need the theory covmat already at the stage of the diagonalisation in vp-setupfit.
There was a problem hiding this comment.
I think it would be best to use whatever is in memory. But I remember we explicitly decide to forbid that a long time ago and I don't remember right now how easy / possible it is to revert that.
If you didn't already, try simply swapping the order of the theory covmat and the diagonalization in the vp_setupfit script. I don't remember when are they actually written down, but if it is upon creation that should be enough.
| 'datacuts::theory::theorycovmatconfig nnfit_theory_covmat' | ||
| ) | ||
|
|
||
| SETUPFIT_FIXED_CONFIG['actions_'] += [rotation_action] |
There was a problem hiding this comment.
@scarlehoff I changed the order of the actions in the hope the theory covmat csv would get written first, but again the same FileNotFoundError occurs. Maybe reportengine first resolves all dependencies regardless of their order? Not sure.
There was a problem hiding this comment.
Yes, it is very likely.
One option is to make it so the diagonalization can only be run from vp-setupfit and so it doesn't depend on the covmat that goes to the csv but on nnfit_theory_covmat, but then you need to also take into consideration the transformations that happen on the csv.
Another possibility is to change nnfit_theory_covmat so that, when it is coming from vp-setupfit it works normally, but when it is running from n3fit it just returns None (or perhaps lambda : None or whatever, not sure if it needs to be a function).
Then the .csv function can depend on nnfit_theory_covmat as well. When it is None it means that it is coming from n3fit (or from whatever that makes it return None) and needs to read the data from the .csv, otherwise use that nnfit_theory_covmat that just arrived.
As I said, we decided back in the day not to have the option to run without vp_setupfit first so I'm not sure this second option will work ootb.
Edit: can you stack two produce rules one of which is an explicit_node? I'm not sure :(
You can always mix and match these options. For instance, making sure that the order is fixed already when the covmat is written to the .csv file so that you don't need to think about that later.
There was a problem hiding this comment.
Thanks Juan! I have gone for the for the first option you suggested, i.e. call nnfit_theory_covmat during vp-setupfit. Further checks are necessary but we have something working at the moment. For instance, I haven't paid close attention to the transformation you mentioned? Is this a reindexing of the pandas data frames when converting them to numpy arrays?
There was a problem hiding this comment.
Pull request overview
This PR is aiming to support running/saving pseudodata in the diagonal (eigenmode) basis by persisting the diagonal-basis rotation/eigensystem (or fitting covmat) at vp-setupfit time, and then reusing it during n3fit (including saving pseudodata indexed by eigenmodes).
Changes:
- Save fitting covmat / diagonal-basis eigensystem via a new
fitting_covmat_tabletable action triggered fromvp-setupfit. - Load the saved eigensystem/covmat inside
fitting_data_dict, and save pseudodata in eigenmode indexing whendiagonal_basis: true. - Minor formatting/whitespace tidy-ups in pseudodata generation and docs.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
validphys2/src/validphys/pseudodata.py |
Reformat call site for replica generation (no functional change). |
validphys2/src/validphys/n3fit_data.py |
Adds covmat/eigensystem persistence + diagonal-basis pseudodata saving; refactors inverse-covmat preparation. |
validphys2/src/validphys/covmats.py |
Minor whitespace change. |
validphys2/src/validphys/config.py |
Removes fitting-covmat selector arg; adds loader for saved fitting covmat; adjusts defaults for theory-covmat loading. |
n3fit/src/n3fit/scripts/vp_setupfit.py |
Adds validphys.n3fit_data providers and schedules the new fitting_covmat_table action. |
doc/sphinx/source/n3fit/runcard_detailed.rst |
Documents diagonal-basis pseudodata saving and the persisted rotation table. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Before trying to solve the failing test, rebase on top of master to avoid doing the work twice (there are things added there like a more strict check on the documentation) |
c06505d to
c95f151
Compare
|
@scarlehoff some worrying news perhaps. I found that the line nnpdf/validphys2/src/validphys/covmats.py Line 486 in 44a5ddf is called twice on master. As a result the theory covmat gets added twice, which then explains why I found different eigenvalues in the presence of a theory covmat. To reproduce this behaviour, use the runcard below and turn on the |
|
What do you mean by twice? It doesn't (shouldn't) matter how many times you call it, it is producing the sum of the experimental and theory covmat. It would be inefficient (perhaps a leftover of having two covmats in the fit, the t0 and the normal) but it should be fine. |
|
Together with @achiefa we have checked that if you print dataset_inputs_t0_exp_covmat you will see that it changes between the two calls, and we suspect it already includes the theory covmat after the first time |
|
That said, regardless how many times it gets called, this function should be just Change it to that and see whether results change. The |
|
Yes this is the issue indeed |
|
Then, if you think this PR is close to finish, change it here so that it gets merged. Otherwise, please open a new PR with this change so that we merge it to master asap (and perhaps even release 4.1.4) |
|
So does this mean that all the fits performed since January are basically wrong? |
|
I'm starting to fear yes, the point is that Let's address this in a separate PR to fix this asap and check meanwhile which fits got affected by this |
I hope it is only since January. Otherwise since we started using t0 for both sampling and fit.* That function has been like that since forever and as a @jacoterh said it does say explicitly "modify covmat in place", my hope is that older versions of pandas/numpy were just creating a new object (so the original covmat is safe). *unless going through it twice only happened recently, or only for the diagonal case. This is also a possibility. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
9762f0d to
85f806c
Compare
|
Let's wait for the tests to finish. Then, I kindly ask you to check thoroughly the changes: the covmat-related code can be tricky to deal with. Worth flagging a latent ordering fragility
As of now, these happen to agree, but only becuase diagonal-basis/thcovmat fits force I suspect that if someone runs with a different Maybe this can be easily fixed by reindexing |
But that covers all cases then, doesn't it? If one does not run with the theory covmat there is nothing to combine in the first place. |
This is the part I'm not entirely sure of. I did local tests using dis-only runcards. Might be different with a complete set of data. |
Let's do that. As @jacoterh said
Which means reindexing won't change anything (because it is already in that order although for a different reason) but it will ensure that if someone adds some other grouping in the future we are protected. |
Perfect, I can take care of that @achiefa |
|
Problem: Fix (by Claude) in construction.py:
|
|
I did a quick check using a reduced version of the 4.0 dataset, including different experiments other than DIS. I used the same runcard for master and this PR, and here is the report: I think this is good news? I still want to see Jaco's report, just to double check. |
WIP
nnfit_theory_covmatis ordered consistently with the experimental covmat before converting both to a dataframe