Skip to content

fix: collect launch properties when some cluster lacks Kuadrant#983

Open
silvi-t wants to merge 2 commits into
Kuadrant:mainfrom
silvi-t:fix-launch-properties
Open

fix: collect launch properties when some cluster lacks Kuadrant#983
silvi-t wants to merge 2 commits into
Kuadrant:mainfrom
silvi-t:fix-launch-properties

Conversation

@silvi-t
Copy link
Copy Markdown
Contributor

@silvi-t silvi-t commented May 18, 2026

Description

  • During nightly runs, when the first cluster was missing Kuadrant, none of the Report Portal launch properties were recorded — the info_collector tests were silently skipped by the global has_kuadrant() check, and the
    openshift_version fixture cascaded a failure through the cluster fixture (which requires the kuadrant namespace)
  • Made make collect / info_collector more resilient: properties are now scraped from the first cluster that has the required components, so we still get records from the remaining clusters
  • Clusters missing Kuadrant now appear in the launch description with not installed instead of being omitted

Changes

Bug Fixes

  • Bypass the has_kuadrant() check in pytest_runtest_setup for info_collector.py — it handles missing components gracefully via try/except
  • Decouple the openshift_version fixture from the cluster fixture by connecting to the default namespace instead, so OCP version detection works regardless of the kuadrant namespace

Enhancements

  • Add _first_connected() helper in info_collector.py that iterates all configured clusters and returns the first (cluster_client, project) pair with a connected namespace
  • Update gather_cluster_versions(), test_kube_context, test_kuadrant_properties, and test_istio_properties to fall back to secondary clusters when the first lacks kuadrant-system or istio-system
  • Update _collect_single_cluster in component_metadata.py to fall back to the default namespace and report kuadrant_image: not installed for clusters missing Kuadrant

Verification steps

# 1. Run info_collector against a setup where the first cluster lacks Kuadrant
make collect

# 2. Import results into Report Portal and verify the launch description
rptool write --launch-name <name> junit-00-collect.xml
  • In Report Portal, verify that the launch description shows all clusters, with kuadrant_image: not installed for clusters missing Kuadrant

Summary by CodeRabbit

  • New Features

    • Improved cluster reachability checks and direct OpenShift version detection from each cluster.
    • Component image reporting now includes images without explicit tags (tag shown as empty).
  • Bug Fixes

    • Metadata collection skips unreachable clusters early and records kuadrant as "not installed" when system project is disconnected.
    • Multi-cluster collection now records per-cluster properties consistently.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

Collectors now use public cluster configurations and consult KubernetesClient.is_reachable and ocp_version. component_metadata includes disconnected-project placeholders and includes images without tags. Tests and fixtures were changed to iterate configured clusters/namespaces and record per-cluster properties, skipping missing namespaces.

Changes

Cluster metadata and test routing

Layer / File(s) Summary
ReportPortal metadata collection changes
testsuite/component_metadata.py
collect_all_clusters uses get_cluster_configurations() and persists per-cluster payloads only when _collect_single_cluster returns a truthy result. _collect_single_cluster returns None if cluster_client.is_reachable is false; for reachable but disconnected projects it returns a payload with placeholder metadata ({"kuadrant_image": "not installed"}), uses cluster_client.api_url for console_url, and sources ocp_version from cluster_client.ocp_version. get_ocp_version was removed. get_component_images now includes images without : by returning tag None.
KubernetesClient reachability and version
testsuite/kubernetes/client.py
Adds module logger, KubernetesClient.is_reachable property (runs oc api-versions, logs on failure and returns False) and KubernetesClient.ocp_version property (runs oc get clusterversion, returns major.minor or None).
Test fixture and setup updates
testsuite/tests/conftest.py
pytest_runtest_setup returns early for tests whose file basename is info_collector.py. openshift_version fixture now accepts testconfig and reads cluster.ocp_version, parsing major.minor or returning None.
Info collector tests with multi-cluster iteration
testsuite/tests/info_collector.py
Adds _all_cluster_projects(namespace) to iterate configured clusters and yield connected (cluster_name, cluster_client, project). New helpers print and deduplicate per-cluster data. Tests now iterate all configured clusters for namespaces (e.g., system_ns, istio-system), recording per-cluster Kubernetes/OpenShift versions (OpenShift from cluster.ocp_version) and recording namespace '<ns>' not found placeholders when a namespace is not connected.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Kuadrant/testsuite#911: Modifies OCP version retrieval and metadata helpers in component_metadata.py, overlapping with version-source and metadata changes in this PR.

Suggested reviewers

  • trepel
  • emmaaroche
  • averevki

Poem

🐰 A hop, I check each cluster's face,
If unreachable, I skip the race.
For projects gone, a placeholder's made,
Images kept even when tags fade.
Hooray — the collector notes each place.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the collection of launch properties when a cluster lacks Kuadrant, which directly addresses the core problem in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively covers the problem, bug fixes, enhancements, and verification steps with clear organisation matching the template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@testsuite/tests/conftest.py`:
- Around line 429-433: The openshift_version function returns None immediately
if project.change_project("default") is not connected; update it to try a
namespace fallback before returning None: after obtaining project =
testconfig["control_plane"]["cluster"].change_project("default"), if not
project.connected iterate available namespaces (e.g., from
testconfig["control_plane"]["cluster"] or
testconfig["control_plane"]["namespaces"]) and call change_project(namespace)
until you find a connected project; once a connected project is found continue
with the existing version detection logic in openshift_version, otherwise return
None.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4b235959-c13d-435e-95f6-9b4fad93e867

📥 Commits

Reviewing files that changed from the base of the PR and between b70abe2 and 0e89998.

📒 Files selected for processing (3)
  • testsuite/component_metadata.py
  • testsuite/tests/conftest.py
  • testsuite/tests/info_collector.py

Comment thread testsuite/tests/conftest.py
@silvi-t silvi-t requested a review from a team May 18, 2026 10:55
@silvi-t silvi-t self-assigned this May 18, 2026
@silvi-t silvi-t added this to Kuadrant May 18, 2026
@silvi-t silvi-t moved this to Ready For Review in Kuadrant May 18, 2026
Comment thread testsuite/tests/info_collector.py Outdated
logger = logging.getLogger(__name__)


def _first_connected(namespace):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remind me why this collector wasn't collecting information from every cluster, and is using only the first one it finds available?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our multicluster pipeline, all clusters share the same configuration. They're deployed with the same kuadrant-operator image, so the component images and versions are identical across clusters. The launch attributes this collector gathers (mostly Kuadrant component images extracted from the operator) would be duplicated if we collected from every cluster. Using only one cluster avoids that redundancy while still capturing all the relevant information.
This approach can change in the future if we need cluster-specific properties, but right now collecting from multiple clusters would only result in duplicate attributes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer collecting the real information from each cluster, even if collect task would take 3x time.

I don't like that this already confusing process is getting refactored with the new confusing algorithm, I need to also be aware of now.

This might be the moment to change this, and assign None values to missing configurations instead of blindly trusting the install pipelines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to pull @zkraus into this conversation to get his opinion as well. My concern is that collecting attributes from every cluster would result in unnecessary duplication (since all clusters share the same operator image, the attributes would be identical). We'd either have duplicate keys or need cluster-prefixed names (e.g. cluster1_kuadrant-operator, cluster2_kuadrant-operator), which makes the launch attributes harder to read and filter on in Report Portal without adding useful information.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm, that the reason was to collect only one, because of assumption of unified environment.
But there should be no harm of collecting it. It will be preparation, if/when we start testing interoperability.

If the versions would be equal, I would not set duplicated attributes (if that is even possible), so maybe checking if there is already equal key and value. This should be simple enough, and we can change that if necessary.

On the other hand, as we will be getting the version and any data, from each cluster -- which is definitely beneficial -- please do use logger, and log the information collected per cluster in the collection test. Log as much as you think would be valuable. In the log, it is accessible, readable, and will not hurt anyone. will not overuse attributes. I think that is nearly ideal place to put a large amount of information, in case we need it later. -- Next upgrade, would be to put this data as an attachment in json/yaml format (next time).

TL;DR: I agree, collect info from all clusters, deduplicate attributes, log everything.

Comment thread testsuite/tests/info_collector.py Outdated
Comment thread testsuite/tests/info_collector.py Outdated
Comment thread testsuite/tests/conftest.py Outdated
Comment thread testsuite/component_metadata.py Outdated
Comment thread testsuite/component_metadata.py Outdated
Signed-off-by: Silvia Tarabova <starabov@redhat.com>
@silvi-t silvi-t force-pushed the fix-launch-properties branch from 0e89998 to abad3a7 Compare May 19, 2026 09:19
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
testsuite/tests/conftest.py (1)

48-50: 💤 Low value

Consider using a pytest marker instead of filename matching.

Bypassing skip/fail logic by checking item.fspath.basename == "info_collector.py" works but is fragile—renaming the file or adding similar collector tests would require changes here. A dedicated marker (e.g., @pytest.mark.skip_capability_check) would be more explicit and maintainable.

That said, given the targeted scope of this fix and the likelihood of only one such file, this is acceptable for now.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@testsuite/tests/conftest.py` around lines 48 - 50, Replace the brittle
filename check that looks for item.fspath.basename == "info_collector.py" with a
pytest marker-based guard: check whether the test item has a marker (e.g.,
"skip_capability_check") using item.get_closest_marker or item.iter_markers, and
skip/return when that marker is present; also update the corresponding test
(info_collector.py) to be decorated with `@pytest.mark.skip_capability_check` so
the new marker-based logic in conftest.py (replacing the item.fspath.basename
check) correctly identifies and excludes that test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@testsuite/tests/conftest.py`:
- Around line 48-50: Replace the brittle filename check that looks for
item.fspath.basename == "info_collector.py" with a pytest marker-based guard:
check whether the test item has a marker (e.g., "skip_capability_check") using
item.get_closest_marker or item.iter_markers, and skip/return when that marker
is present; also update the corresponding test (info_collector.py) to be
decorated with `@pytest.mark.skip_capability_check` so the new marker-based logic
in conftest.py (replacing the item.fspath.basename check) correctly identifies
and excludes that test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c82d7df1-68c7-4561-89db-7d3c6a7a74f3

📥 Commits

Reviewing files that changed from the base of the PR and between 0e89998 and abad3a7.

📒 Files selected for processing (4)
  • testsuite/component_metadata.py
  • testsuite/kubernetes/client.py
  • testsuite/tests/conftest.py
  • testsuite/tests/info_collector.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • testsuite/tests/info_collector.py

Signed-off-by: Silvia Tarabova <starabov@redhat.com>
@silvi-t
Copy link
Copy Markdown
Contributor Author

silvi-t commented May 19, 2026

@averevki , @zkraus

Brief summary of changes

testsuite/kubernetes/client.py

  • Added is_reachable property — checks cluster connectivity using oc api-versions (works on both OCP and Kind)
  • Added ocp_version property — returns OpenShift version (major.minor) using status.history[0].version

testsuite/component_metadata.py

  • Made _get_cluster_configurations a public @staticmethod (get_cluster_configurations) for reuse
  • Refactored _collect_single_cluster to use is_reachable instead of change_project("default").connected
  • Removed get_ocp_version — replaced by KubernetesClient.ocp_version
  • Removed fallback to default namespace — single dict structure with "not installed" for missing Kuadrant

testsuite/tests/info_collector.py

  • Replaced _first_connected with _all_cluster_projects — iterates all clusters, yields None for missing namespaces
  • Added _print_cluster_data helper — standardized per-cluster logging
  • Added _record_unique helper — deduplicates attributes before recording to Report Portal
  • All property tests now collect from every cluster, log per-cluster data, and only record unique attributes
  • Replaced hardcoded "kuadrant-system" with settings["service_protection"]["system_project"]
  • Removed test_kube_context

testsuite/tests/conftest.py

  • openshift_version fixture now uses cluster.ocp_version and status.history[0].version

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@testsuite/tests/info_collector.py`:
- Around line 90-107: The loop over _all_cluster_projects never records a
kube_context for the first usable cluster; add logic so that when you encounter
the first cluster where project is not None you append a single ("kube_context",
cluster_name) entry to properties (or call _record_unique with that pair) and
mark it done so it only records once; update the loop that references
cluster_name, cluster, project (and uses
ReportPortalMetadataCollector.get_kubernetes_version, _record_unique,
record_testsuite_property) to set this flag and add the kube_context tuple
before continuing to collect versions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7c7d5c24-adab-4476-ba7b-3adb7a45ae73

📥 Commits

Reviewing files that changed from the base of the PR and between abad3a7 and 18da2c5.

📒 Files selected for processing (2)
  • testsuite/component_metadata.py
  • testsuite/tests/info_collector.py

Comment thread testsuite/tests/info_collector.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready For Review

Development

Successfully merging this pull request may close these issues.

3 participants