-
Notifications
You must be signed in to change notification settings - Fork 15
Add attribution metadata policy for BIG_DATA #610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
00f8354
0fe8431
f3061cc
2ecfbf4
ed9a99a
812d7d8
3c29b5a
15c6361
c824c7b
a4d2e66
77a3638
dcbcfe8
00bc7c1
cf10e24
25d5cec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,6 +54,7 @@ carefully: | |
| kgo | ||
| diagnostics | ||
| rose_stem | ||
| testdata | ||
| testing | ||
|
|
||
| .. important:: | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,109 @@ | ||||||
| .. ----------------------------------------------------------------------------- | ||||||
| (c) Crown copyright Met Office. All rights reserved. | ||||||
| The file LICENCE, distributed with this code, contains details of the terms | ||||||
| under which the code may be used. | ||||||
| ----------------------------------------------------------------------------- | ||||||
|
|
||||||
| .. _testdata: | ||||||
|
|
||||||
| Adding Test Data | ||||||
| ================ | ||||||
|
|
||||||
| .. note:: | ||||||
|
|
||||||
| This page is a placeholder for information about test data. It is not yet | ||||||
| complete and will be updated in due course. | ||||||
|
|
||||||
| *The instructions here are Met Office specific, other sites may manage their | ||||||
| test data differently.* | ||||||
|
|
||||||
| .. important:: **Attribution Metadata Policy** | ||||||
|
|
||||||
| If the change requires a new or updated file in ``LFRIC_DATA_DIR`` then you | ||||||
| will need to work with the Information Asset Owner (IAO) to ensure that data | ||||||
| in ``LFRIC_DATA_DIR`` must include clear attribution and licence metadata. | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
| Where possible, this should follow existing UM ``ANCILDIR`` conventions (`see | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better not to imply that this is optional:
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For LFRic, we prefer to include the metadata and license as NetCDF global attributes rather than storing them separately. This approach does not currently apply to UM ANCILDIR. For non-NetCDF LFRic files, we follow UM ANCILDIR convention.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On balance, it would be better to remove this entire pull-out box and put the information in the Licences section. The different standards for netCDF and non-netCDF files could then be specified separately to make each point clearer. I think it would also be useful to remove the comment about the LFRic data directory, because this policy applies regardless of where the files are being installed. |
||||||
| below <prerequisites-section_>`_), with ``.attribution`` and ``.license`` | ||||||
| files or equivalent NetCDF **global attributes** (at least, ``references``, | ||||||
| ``license``, ``source``, and ``history``). Attribution must reflect the | ||||||
| original data source and be provided by the data creators before deployment, | ||||||
| share, or distribution. | ||||||
|
|
||||||
| It is treated as an **Information Asset / licensing requirement**, not just | ||||||
| a best practice. | ||||||
|
|
||||||
|
|
||||||
| For UM related datasets, please Email the `MIAO team <mailto:miao@metoffice.gov.uk>`_ | ||||||
| to discuss the best way to share the data. | ||||||
|
|
||||||
| .. _prerequisites-section: | ||||||
|
|
||||||
| Prerequisites | ||||||
| ------------- | ||||||
|
|
||||||
| Before adding test data, you should have a good understanding of the change you | ||||||
| are making and the tests you will be adding. You should also have a good | ||||||
| understanding of the codebase and the testing framework you will be using. | ||||||
|
|
||||||
| Licenses | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
| ~~~~~~~~ | ||||||
|
|
||||||
| All files will require a licence and a record of where they have come from, both | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
| for legal and auditing purposes. In your request please mention how the files | ||||||
| was generated/produced and where as well as what licence it has, and what the | ||||||
| conditions of the licence are. | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
| Before files can be deployed we must get IAO approval, we cannot do this without | ||||||
| knowing the licence of the files to be deployed. | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
| Metadata | ||||||
| ~~~~~~~~ | ||||||
|
|
||||||
| Any file requirements should be recorded in or alongside the files being | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
| deployed. | ||||||
|
|
||||||
| Note that if a source file has a licence that imposes requirements on derived | ||||||
| works, then an ancillary file (or an intermediate file used to generate an | ||||||
| ancillary) does count as a derived work for the purpose of recording metadata. | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
| In cases where a file has been generated from multiple sources, it should be | ||||||
| made clear where each licence/attribution/acknowledgement has come from. | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
|
|
||||||
| NetCDF Files | ||||||
| ^^^^^^^^^^^^ | ||||||
|
|
||||||
| NetCDF files should have the relevant metadata included in the file itself. | ||||||
| The metadata should include the following information: | ||||||
|
|
||||||
| * If there is a licence, it should be in a ``license`` global attribute as per | ||||||
|
yaswant marked this conversation as resolved.
Outdated
|
||||||
| `ESIP Attribute Convention for Data Discovery <https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3#Recommended>`_. | ||||||
|
|
||||||
| * If there is a paper attribution requirement, the relevant paper(s) should be | ||||||
| cited in the ``references`` global attribute as per | ||||||
| `CF conventions <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#description-of-file-contents/>`_. | ||||||
|
|
||||||
| * If there is an organisation attribution requirement, it should be in the | ||||||
| ``institution`` global attribute (again, as per CF). | ||||||
|
|
||||||
| * If there is any other attribution requirement (e.g. for an individual), it | ||||||
| should be in the ``acknowledgement`` global attribute (again, as per ACCD). | ||||||
|
|
||||||
| * If there are restrictions on usage (e.g. "research only"), these should be in | ||||||
| a ``restrictions`` global attribute. | ||||||
|
|
||||||
| Other Files | ||||||
| ^^^^^^^^^^^ | ||||||
|
|
||||||
| * Licence should be in an accompanying plain text file with the same name as the | ||||||
| data file, but with a ``.license`` suffix. | ||||||
|
|
||||||
| * Attribution should be in an accompanying plain text file with the same name as | ||||||
| the data file, but with a ``.attribution`` suffix. | ||||||
|
|
||||||
| * Restrictions on usage (e.g. "research only") should be in an accompanying | ||||||
| plain text file with the same name as the data file, but with a | ||||||
| ``.restrictions`` suffix. | ||||||
|
|
||||||
| If you have questions about the process or concerns about the provenance of the | ||||||
| data you want to include, please engage with the IAO as early as possible to | ||||||
| prevent delays to your change later on. | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be at the start of the document not the end. It should also include an explicit statement that changes which depend on unlicenced data or data with an unknown provenance will not be approved until problems can be resolved. This will give greater clarity to both developers and reviewers.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to describe these instructions as applying to all Met Office managed systems. That would also cover any test data which has to be copied to other platforms, e.g. jules data on jasmin, as well as UM AUX, Socrates spectral etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.