Skip to content

Add prototype paramdb api#185

Draft
awarde96 wants to merge 18 commits into
developfrom
feature/paramdb-api
Draft

Add prototype paramdb api#185
awarde96 wants to merge 18 commits into
developfrom
feature/paramdb-api

Conversation

@awarde96

Copy link
Copy Markdown

Add prototype paramdb api that allows conversion of ECMWF shortname to param id and vice versa using either the paramdb directly or a local copy.

Adds a ParamDB class to pymetkit.py for resolving ECMWF parameter metadata (shortname ↔ longname ↔ param ID, units) with both online (live API) and offline (bundled YAML) modes.

New files

  • parameter_metadata.yaml — bundled offline parameter data generated from the parameter-database
  • generate_parameter_metadata.py — standalone script to regenerate the YAML from the ECMWF API (python -m pymetkit.generate_parameter_metadata)

Changes

  • pymetkit.py — adds ParamDB class; imports yaml, requests, Path
    *. __init__.py — explicitly exports ParamDB
  • pyproject.toml — adds pyyaml + requests to dependencies; includes parameter_metadata.yaml in package data

Usage

from pymetkit import ParamDB

# Offline (default, uses bundled YAML)
db = ParamDB()
db.shortname_to_longname("2t")        # "2 metre temperature"
db.param_id_to_shortname(129)         # "z"
db.longname_to_param_id("Geopotential")  # 129
db.get_units("tp")                    # "m"
db.get_metadata(167)                  # full dict

# Online (fetches from codes.ecmwf.int)
db = ParamDB(mode="online")

Lookup via get_metadata / get_units accepts param ID (int), shortname, or longname interchangeably.

Description

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

…o param id and vice versa using either the paramdb directly or a local copy
@codecov-commenter

codecov-commenter commented Mar 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.08%. Comparing base (0844213) to head (917d33b).
⚠️ Report is 34 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #185   +/-   ##
========================================
  Coverage    62.08%   62.08%           
========================================
  Files          303      303           
  Lines        11689    11689           
  Branches      1050     1050           
========================================
  Hits          7257     7257           
  Misses        4432     4432           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a prototype ParamDB API to the Python bindings, enabling ECMWF parameter metadata lookups (shortname/longname/ID/units) in offline mode (bundled YAML) and online mode (ECMWF API with local caching). It also introduces a regeneration script and new YAML metadata files.

Changes:

  • Added ParamDB class to support offline YAML-backed lookups and online API fetching with a JSON cache.
  • Added a standalone script to regenerate parameter/unit YAML metadata from the ECMWF API.
  • Added unit metadata YAML and a comprehensive Python test suite for ParamDB.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
share/metkit/unit_metadata.yaml Adds bundled unit metadata YAML (offline dataset).
python/pymetkit/tests/test_paramdb.py Adds unit tests for ParamDB (offline lookups + online caching behaviors).
python/pymetkit/src/pymetkit/pymetkit.py Implements ParamDB in the main Python module and adds optional imports for online mode/caching.
python/pymetkit/src/pymetkit/generate_parameter_metadata.py Adds a script to fetch/emit parameter_metadata.yaml and unit_metadata.yaml from the ECMWF API.
python/pymetkit/src/pymetkit/init.py Exports ParamDB from the package.
pyproject.toml Adds runtime deps (pyyaml, requests, platformdirs) and declares YAML package data.

Comment on lines +26 to +41
@pytest.mark.parametrize(
"mode, expectation",
[
["offline", does_not_raise()],
["online", does_not_raise()], # network call; skipped below if no requests
["invalid", pytest.raises(ValueError)],
["OFFLINE", pytest.raises(ValueError)],
],
)
def test_constructor_mode_validation(mode, expectation):
"""Only 'online' and 'offline' are accepted mode values."""
if mode == "online":
pytest.importorskip("requests")
with expectation:
ParamDB(mode=mode)

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_constructor_mode_validation instantiates ParamDB(mode="online"), which triggers a real HTTP request to the ECMWF API. Because requests is now a required dependency, this test will run in CI and become flaky or fail in offline environments. Consider monkeypatching pymetkit.pymetkit._requests (as done in the other online tests) or patching ParamDB._load_online to a no-op so this test only validates mode parsing without network access.

Copilot uses AI. Check for mistakes.
Comment thread pyproject.toml
Comment on lines +45 to 48
"metkit_c.h",
"parameter_metadata.yaml"
]

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter_metadata.yaml is declared as package data for the pymetkit package, but the repository’s YAML lives under share/metkit/parameter_metadata.yaml (there is no python/pymetkit/src/pymetkit/parameter_metadata.yaml). As a result, installed wheels/sdists will likely not contain the bundled YAML and offline ParamDB() will fail. Either move/copy the YAML into the package directory at build time, or adjust packaging and the loader to use importlib.resources/pkgutil to read the bundled file reliably.

Suggested change
"metkit_c.h",
"parameter_metadata.yaml"
]
"metkit_c.h"
]
[tool.setuptools.data-files]
"share/metkit" = [
"share/metkit/parameter_metadata.yaml"
]

Copilot uses AI. Check for mistakes.
Comment on lines +489 to +507
@staticmethod
def _find_offline_yaml() -> Path:
"""Locate ``parameter_metadata.yaml``, searching in order:

1. Next to this module file (installed package layout).
2. ``<repo_root>/share/metkit/`` (development tree layout after the
YAML files were moved out of the Python package directory).
"""
candidates = [
Path(__file__).parent / "parameter_metadata.yaml",
Path(__file__).parents[4] / "share" / "metkit" / "parameter_metadata.yaml",
]
for path in candidates:
if path.exists():
return path
raise FileNotFoundError(
"parameter_metadata.yaml not found. Searched:\n"
+ "\n".join(f" {p}" for p in candidates)
)

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offline YAML lookup relies on Path(__file__).parents[4] / "share/metkit/...", which only works in the repo checkout layout. In an installed package this path will typically point somewhere like <venv>/lib/.../share/metkit/ and the bundled YAML won’t be found. Prefer loading the YAML via importlib.resources from packaged data, or ensure the YAML is placed next to this module in the built distribution and remove the brittle repo-root heuristic.

Copilot uses AI. Check for mistakes.
Comment on lines +337 to +340
request. The cache is stored under the OS user-cache directory
(e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/``
on macOS) and is keyed to the API URL so that a different endpoint
produces a separate file.

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring says the cache file is “keyed to the API URL so that a different endpoint produces a separate file”, but the implementation always uses the fixed _CACHE_FILENAME and does not incorporate _API_URL. Either update the implementation to include a URL-derived component (e.g., hash) in the filename or adjust the docstring to match the behavior.

Suggested change
request. The cache is stored under the OS user-cache directory
(e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/``
on macOS) and is keyed to the API URL so that a different endpoint
produces a separate file.
request. The cache is stored under the OS user-cache directory
(e.g. ``~/.cache/pymetkit/`` on Linux, ``~/Library/Caches/pymetkit/``
on macOS) using the fixed filename defined by
``_CACHE_FILENAME``.

Copilot uses AI. Check for mistakes.
return

# Fetch from the API
response = _requests.get(self._API_URL)

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_requests.get(self._API_URL) is called without a timeout. In library code this can hang indefinitely on network issues, which is an operational reliability problem. Consider adding a reasonable default timeout (and/or making it configurable) for online mode requests.

Suggested change
response = _requests.get(self._API_URL)
request_timeout = 10
response = _requests.get(self._API_URL, timeout=request_timeout)

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +21
PARAM_URL = "https://codes.ecmwf.int/parameter-database/api/v1/param/"
UNIT_URL = "https://codes.ecmwf.int/parameter-database/api/v1/unit/"
PARAM_OUTPUT = Path(__file__).parent / "parameter_metadata.yaml"
UNIT_OUTPUT = Path(__file__).parent / "unit_metadata.yaml"

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator writes parameter_metadata.yaml and unit_metadata.yaml next to the Python module (Path(__file__).parent), but the PR adds the YAML under share/metkit/ and ParamDB’s fallback search also expects share/metkit/parameter_metadata.yaml. Regenerating with this script will therefore write to a different location than the committed data. Align the output paths with the repository’s canonical YAML location (or update the rest of the codebase to consume the module-adjacent files).

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +42
print(f"Fetching units from {url} ...")
response = requests.get(url)
response.raise_for_status()
raw_units = response.json()

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both API calls use requests.get(...) without a timeout. If the endpoint stalls, this script can block indefinitely. Consider providing a default timeout (and possibly a retry strategy) to make regeneration more robust.

Copilot uses AI. Check for mistakes.
if key == "id":
continue
entry[key] = value

Copilot AI Apr 10, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_units() computes a normalised unit string (name = raw.get("name") or raw.get("symbol") ...) for unit_map, but the YAML output preserves the raw keys and does not ensure there is a canonical name field. If the API returns symbol/label instead of name, unit_metadata.yaml will lack the expected key. Consider explicitly setting entry["name"] = name (and/or dropping the alternate keys) to keep the output schema stable.

Suggested change
# Always emit a canonical name field so unit_metadata.yaml has a stable schema
entry["name"] = name

Copilot uses AI. Check for mistakes.
awarde96 and others added 7 commits April 10, 2026 12:02
- Remove parameter_metadata.yaml from pyproject.toml package-data (file lives
  in share/metkit/, not inside the Python package)
- Add importlib.resources-based lookup as first candidate in _find_offline_yaml
  so installed wheels can locate the bundled YAML correctly
- Fix ParamDB class docstring: cache file is keyed to fixed _CACHE_FILENAME,
  not the API URL
- Add _REQUEST_TIMEOUT = 30 class constant and pass it to _requests.get() to
  prevent indefinite hangs
- Fix generate_parameter_metadata.py output paths to write into share/metkit/
  instead of next to the module; add REQUEST_TIMEOUT and pass to both
  requests.get() calls
- Emit canonical 'name' field in fetch_units() regardless of which key the API
  returns (symbol / label / name)
- Fix test_constructor_mode_validation to patch _load_online instead of making
  a real network request; hoist shared module import to top of test file
…ault chnaged to hoose id if clash present; if dissemination present use -> orgin ECMWF, WMO, other, -> finally lowest number if still clashes
… orgin ECMWF -> origin WMO -> access_id dissemination -> lowest id
Symlink share/metkit/parameter_metadata.yaml directly into the package
directory so pymetkit can find it at development time. Setuptools follows
the symlink when building wheels, including the real file in production
installs.
awarde96 added 3 commits June 16, 2026 09:33
…and docs

- PatchedLib.__init__ now converts AttributeError (missing symbols) into
  CFFIModuleLoadFailed for a clean error path
- Module-level lib load failure now sets lib=None + emits ImportWarning
  instead of raising ImportError, so ParamDB can be used without the C lib
- MarsRequest.__init__ and parse_mars_request() guard with lib-is-None check
  and raise ImportError with an actionable message
- Add lazy-loading tests, ParameterEntry model tests (135 tests total)
- Add benchmark_paramdb.py, custom_param_example.yaml, and full README
awarde96 and others added 2 commits June 17, 2026 11:35
… no longer do pydantic validation on bundled version as not needed
feat: symlink parameter_metadata.yaml from share/metkit into package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants