Add attribution metadata policy for BIG_DATA#610
Add attribution metadata policy for BIG_DATA#610Yaswant Pradhan (yaswant) wants to merge 15 commits into
Conversation
|
This is a good start, but there are no rules for developers. Without clear guidance on what sort of data can be added and what the supporting evidence is required, this will result in more problems during review. The PR also seems to be doing two things: adding information about the big data policy, and fixing typos and changing build instructions in other sections. These are separate and the second part needs to be split out into another PR. |
|
Good points, Sam Clarke-Green (@t00sa) - I initially thought this would be best placed in the Developer section, but since the reference to BIG_DATA was only in the Reviewer section, that's where it ended up. You're correct that some adjustments are needed. Regarding what's permitted, ANCILDIR-Deploy should be considered the single source of truth. I'll revert the typo and jules-doc compilation updates and address them in another PR. |
|
Thanks both. Moved the guidance to development checklist section. Included process (extract from ANCILDIR-Deploy) here for wider visibility. |
Sam Clarke-Green (t00sa)
left a comment
There was a problem hiding this comment.
See comments in-line
| If the change requires a new or updated file in ``LFRIC_DATA_DIR`` then you | ||
| will need to work with the Information Asset Owner (IAO) to ensure that data | ||
| in ``LFRIC_DATA_DIR`` must include clear attribution and licence metadata. | ||
| Where possible, this should follow existing UM ``ANCILDIR`` conventions (`see |
There was a problem hiding this comment.
Better not to imply that this is optional:
| Where possible, this should follow existing UM ``ANCILDIR`` conventions (`see | |
| This should follow existing UM ``ANCILDIR`` conventions (`see |
There was a problem hiding this comment.
For LFRic, we prefer to include the metadata and license as NetCDF global attributes rather than storing them separately. This approach does not currently apply to UM ANCILDIR.
For non-NetCDF LFRic files, we follow UM ANCILDIR convention.
There was a problem hiding this comment.
On balance, it would be better to remove this entire pull-out box and put the information in the Licences section. The different standards for netCDF and non-netCDF files could then be specified separately to make each point clearer.
I think it would also be useful to remove the comment about the LFRic data directory, because this policy applies regardless of where the files are being installed.
Apply CR suggestion Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
Apply CR suggestion Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
Apply CR suggestion Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
Apply CR suggestion Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
Apply CR suggestion Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
Apply CR suggestion Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
Apply CR suggestion. Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
Ben Fitzpatrick (benfitzpatrick)
left a comment
There was a problem hiding this comment.
Looks good to me, thank you!
Sam Clarke-Green (t00sa)
left a comment
There was a problem hiding this comment.
See comments inline. Given that the licencing of data is mandatory, the language should be unequivocal and the consequences of failing to engage with an IAO should be made clear in the introduction.
| If the change requires a new or updated file in ``LFRIC_DATA_DIR`` then you | ||
| will need to work with the Information Asset Owner (IAO) to ensure that data | ||
| in ``LFRIC_DATA_DIR`` must include clear attribution and licence metadata. | ||
| Where possible, this should follow existing UM ``ANCILDIR`` conventions (`see |
There was a problem hiding this comment.
On balance, it would be better to remove this entire pull-out box and put the information in the Licences section. The different standards for netCDF and non-netCDF files could then be specified separately to make each point clearer.
I think it would also be useful to remove the comment about the LFRic data directory, because this policy applies regardless of where the files are being installed.
There was a problem hiding this comment.
It would be better to describe these instructions as applying to all Met Office managed systems. That would also cover any test data which has to be copied to other platforms, e.g. jules data on jasmin, as well as UM AUX, Socrates spectral etc.
There was a problem hiding this comment.
Done.
| If you have questions about the process or concerns about the provenance of the | ||
| data you want to include, please engage with the IAO as early as possible to | ||
| prevent delays to your change later on. |
There was a problem hiding this comment.
This should be at the start of the document not the end.
It should also include an explicit statement that changes which depend on unlicenced data or data with an unknown provenance will not be approved until problems can be resolved. This will give greater clarity to both developers and reviewers.
There was a problem hiding this comment.
Done.
Apply reviewer suggestion. Co-authored-by: Sam Clarke-Green <74185251+t00sa@users.noreply.github.com>
PR Summary
IAO Approver: Ben Fitzpatrick (@benfitzpatrick)
Code Reviewer: Sam Clarke-Green (@t00sa)
Cc: Andrew Clark (@arjclark)
Adding attribution metadata policy to working practices.
Code Quality Checklist
IAO Comments
Code Review