Skip to content

ci: run integration tests in 2 batches#391

Open
olebel wants to merge 1 commit into
mainfrom
ob/test
Open

ci: run integration tests in 2 batches#391
olebel wants to merge 1 commit into
mainfrom
ob/test

Conversation

@olebel
Copy link
Copy Markdown

@olebel olebel commented Apr 16, 2026

Summary

Split E2E integration tests into two parallel batches to reduce overall CI wall time.

Previously, all 12 test suites ran sequentially in a single TestIntegrationTestSuite function. Each suite performs a full stack setup (in-process testnet, provider, operators) and teardown independently, with no ordering dependencies between suites — making them safe to split.

Changes

  • integration/e2e_test.go — added TestIntegrationBatch1 and TestIntegrationBatch2 alongside the existing TestIntegrationTestSuite; removed duplicate E2EPersistentStorageDefault entry that was previously run twice

    • Batch 1:
      • E2EContainerToContainer
      • E2EAppNodePort
      • E2EDeploymentUpdate
      • E2EApp
      • E2EMigrateHostname
    • Batch 2:
      • E2EPersistentStorageDefault
      • E2EPersistentStorageBeta2
      • E2EPersistentStorageDeploymentUpdate
      • E2EStorageClassRam
      • E2ECustomCurrency
      • E2EIPAddress
  • make/test-integration.mk — added test-e2e-integration-batch1 and test-e2e-integration-batch2 targets

  • .github/workflows/integration-tests.yaml — added e2e_target input (default: test-e2e-integration) to make the reusable workflow target-parametric

  • .github/workflows/tests.yaml — replaced single integration-tests job with two parallel jobs (integration-tests-batch1, integration-tests-batch2), each passing the respective make target; both are required for dispatch-release

Expected impact

Each parallel job runs its own Kind cluster (full isolation, no shared k8s state). Aims to reduce overall integration tests CI time.

@olebel olebel requested a review from a team as a code owner April 16, 2026 03:48
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Walkthrough

The pull request splits E2E integration tests into two parallel batches by introducing a configurable workflow input parameter, creating separate GitHub Actions jobs for each batch, adding corresponding Go test functions, and defining new Make targets to orchestrate the batch executions.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Configuration
.github/workflows/integration-tests.yaml, .github/workflows/tests.yaml
Made the reusable integration test workflow parameterizable with an e2e_target input. Split the single integration-tests job into two separate batch jobs (integration-tests-batch1, integration-tests-batch2), each passing a distinct Make target to the reusable workflow. Updated job dependencies in dispatch-release to await both batch jobs.
E2E Test Suite
integration/e2e_test.go
Added TestIntegrationBatch1 and TestIntegrationBatch2 functions that partition existing E2E test suites into two subsets: Batch1 runs container, node port, deployment, and app tests; Batch2 runs persistent storage, custom currency, and IP address tests.
Build Targets
make/test-integration.mk
Added .PHONY targets test-e2e-integration-batch1 and test-e2e-integration-batch2, each invoking Go tests with the e2e build tag, $(INTEGRATION_VARS), $(KIND_VARS), a 1500-second timeout, and targeting the respective batch test functions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 The tests hop into two swift lanes,
Batches racing through their domains,
Workflows reused with inputs so fine,
Suites divided, efficiency divine,
Parallel glory in the CI-CD vine! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ci: run integration tests in 2 batches' directly and clearly summarizes the main change: splitting E2E integration tests into parallel batches to reduce CI wall time.
Description check ✅ Passed The description provides comprehensive context about why the change is needed, how tests are split, lists specific test suites in each batch, and explains the expected impact on CI performance.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ob/test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/integration-tests.yaml:
- Line 153: The workflow runs user input unsafely via the line containing make
${{ inputs.e2e_target }}, so validate and quote the e2e_target input before
executing: constrain inputs.e2e_target to an allowlist or a strict regex (e.g.,
only letters/numbers/[-_]) and then invoke make with the validated, quoted value
(use the inputs.e2e_target identifier) so the shell executes a single safe token
instead of allowing command injection.

In @.github/workflows/tests.yaml:
- Around line 107-116: The reusable workflow hardcodes KIND_NAME causing
parallel batch collision; add a new input named kind_name to the reusable
workflow (default "kube") and update the crd-e2e job to use this input instead
of the hardcoded KIND_NAME, then pass unique values from the callers
(integration-tests-batch1 and integration-tests-batch2) via the with: block
(e.g., kind_name: kube-batch1 and kind_name: kube-batch2) so each batch uses a
distinct Kind cluster name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6bcd0b81-c830-4352-be1f-e3396f76e514

📥 Commits

Reviewing files that changed from the base of the PR and between a785772 and 22e45dd.

📒 Files selected for processing (4)
  • .github/workflows/integration-tests.yaml
  • .github/workflows/tests.yaml
  • integration/e2e_test.go
  • make/test-integration.mk

working-directory: ${{ env.GOPATH }}/src/github.com/akash-network/provider
run: |
make test-e2e-integration
make ${{ inputs.e2e_target }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate and quote e2e_target before shell execution.

Line [153] executes a raw workflow input in bash. That allows command injection if an unexpected value is passed via workflow_call.

🔒 Proposed fix
-      - name: Run E2E Tests
-        working-directory: ${{ env.GOPATH }}/src/github.com/akash-network/provider
-        run: |
-          make ${{ inputs.e2e_target }}
+      - name: Run E2E Tests
+        working-directory: ${{ env.GOPATH }}/src/github.com/akash-network/provider
+        run: |
+          target='${{ inputs.e2e_target }}'
+          case "$target" in
+            test-e2e-integration|test-e2e-integration-batch1|test-e2e-integration-batch2) ;;
+            *) echo "invalid e2e_target: $target" >&2; exit 1 ;;
+          esac
+          make "$target"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/integration-tests.yaml at line 153, The workflow runs user
input unsafely via the line containing make ${{ inputs.e2e_target }}, so
validate and quote the e2e_target input before executing: constrain
inputs.e2e_target to an allowlist or a strict regex (e.g., only
letters/numbers/[-_]) and then invoke make with the validated, quoted value (use
the inputs.e2e_target identifier) so the shell executes a single safe token
instead of allowing command injection.

Comment on lines +107 to 116
integration-tests-batch1:
uses: ./.github/workflows/integration-tests.yaml
with:
e2e_target: test-e2e-integration-batch1

integration-tests-batch2:
uses: ./.github/workflows/integration-tests.yaml
with:
e2e_target: test-e2e-integration-batch2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and inspect the integration-tests.yaml workflow
find .github -name "*.yaml" -o -name "*.yml" | head -20

Repository: akash-network/provider

Length of output: 314


🏁 Script executed:

# Check if integration-tests.yaml exists and read relevant sections
if [ -f ".github/workflows/integration-tests.yaml" ]; then
  echo "=== integration-tests.yaml exists ==="
  wc -l ".github/workflows/integration-tests.yaml"
  echo ""
  echo "=== Lines around 24 (KIND_NAME) ==="
  sed -n '15,35p' ".github/workflows/integration-tests.yaml"
  echo ""
  echo "=== Inputs section ==="
  rg -A 20 "^inputs:" ".github/workflows/integration-tests.yaml" | head -40
fi

Repository: akash-network/provider

Length of output: 678


🏁 Script executed:

# Check the tests.yaml file for how it calls the reusable workflow
if [ -f ".github/workflows/tests.yaml" ]; then
  echo "=== Lines 107-116 from tests.yaml ==="
  sed -n '107,116p' ".github/workflows/tests.yaml"
  echo ""
  echo "=== Full context of integration-tests calls ==="
  sed -n '100,140p' ".github/workflows/tests.yaml"
fi

Repository: akash-network/provider

Length of output: 1395


🏁 Script executed:

# Search for any kind_name or KIND_NAME usage across workflows
echo "=== Searching for kind_name or KIND_NAME in workflows ==="
rg -i "kind.?name" .github/workflows/ -A 2 -B 2

Repository: akash-network/provider

Length of output: 904


🏁 Script executed:

# Get full structure of integration-tests.yaml to understand jobs
echo "=== Full integration-tests.yaml with job names ==="
cat -n ".github/workflows/integration-tests.yaml" | grep -E "^\s*[0-9]+\s+(jobs:|[a-z-]+:)" | head -40

Repository: akash-network/provider

Length of output: 1633


🏁 Script executed:

# Get the inputs section and full job list
echo "=== Integration-tests.yaml - inputs and jobs section ==="
sed -n '1,80p' ".github/workflows/integration-tests.yaml" | cat -n

Repository: akash-network/provider

Length of output: 3486


🏁 Script executed:

# Search for all job definitions and their conditions in integration-tests.yaml
echo "=== All jobs in integration-tests.yaml ==="
rg "^  [a-z-]+:" ".github/workflows/integration-tests.yaml" -A 5 | head -80

Repository: akash-network/provider

Length of output: 158


🏁 Script executed:

# Get complete job names from integration-tests.yaml
echo "=== All job definitions in integration-tests.yaml ==="
rg "^  [a-z0-9-]+:$" ".github/workflows/integration-tests.yaml"

Repository: akash-network/provider

Length of output: 137


🏁 Script executed:

# Check the full file to see structure
echo "=== Line count and structure ==="
wc -l ".github/workflows/integration-tests.yaml"
echo ""
echo "=== Section headers (jobs, needs, etc) ==="
grep -n "^jobs:\|^  [a-z-]*:\|^    if:\|^    needs:" ".github/workflows/integration-tests.yaml" | head -50

Repository: akash-network/provider

Length of output: 206


🏁 Script executed:

# Get the lines where e2e_target is used
echo "=== Where e2e_target is used in integration-tests.yaml ==="
rg -n "e2e_target" ".github/workflows/integration-tests.yaml" -B 2 -A 2

Repository: akash-network/provider

Length of output: 481


🏁 Script executed:

# Check if kind_name input exists in integration-tests.yaml
echo "=== Check inputs section of integration-tests.yaml ==="
sed -n '5,15p' ".github/workflows/integration-tests.yaml"
echo ""
echo "=== Search for any kind_name or cluster isolation in integration-tests.yaml ==="
rg -i "kind.name|cluster.name" ".github/workflows/integration-tests.yaml" -B 3 -A 3

Repository: akash-network/provider

Length of output: 934


🏁 Script executed:

# Check if there's any unique naming strategy or runner isolation config
echo "=== Check for runner labels or self-hosted config ==="
rg "runs-on:|self-hosted" ".github/workflows/" -B 2 -A 2 | head -60

Repository: akash-network/provider

Length of output: 2545


🏁 Script executed:

# Check if github.run_id or other unique vars are used anywhere for isolation
echo "=== Check for unique run identifiers used in Kind setup ==="
rg "github\.(run_id|run_number|job)" ".github/workflows/integration-tests.yaml" -B 2 -A 2

Repository: akash-network/provider

Length of output: 125


🏁 Script executed:

# Final confirmation: kind_name does not exist as input
echo "=== All inputs in integration-tests.yaml ==="
sed -n '5,13p' ".github/workflows/integration-tests.yaml"

Repository: akash-network/provider

Length of output: 296


Parallel batches collide on hardcoded Kind cluster name.

Both batch jobs invoke crd-e2e from the same reusable workflow in parallel, and KIND_NAME: kube is hardcoded (.github/workflows/integration-tests.yaml, line 24). On core-e2e self-hosted runners sharing a Docker daemon, concurrent jobs will conflict on the same cluster name, causing flaky test failures.

Add a kind_name input to the reusable workflow and pass unique values per batch (e.g., kube-batch1, kube-batch2) to isolate cluster namespaces:

Suggested mitigation

In .github/workflows/integration-tests.yaml, add to inputs:

kind_name:
  description: 'Kind cluster name'
  required: false
  default: 'kube'
  type: string

Then update the crd-e2e job to use it instead of the hardcoded value, and in .github/workflows/tests.yaml, pass unique names:

integration-tests-batch1:
  uses: ./.github/workflows/integration-tests.yaml
  with:
    e2e_target: test-e2e-integration-batch1
    kind_name: kube-batch1

integration-tests-batch2:
  uses: ./.github/workflows/integration-tests.yaml
  with:
    e2e_target: test-e2e-integration-batch2
    kind_name: kube-batch2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/tests.yaml around lines 107 - 116, The reusable workflow
hardcodes KIND_NAME causing parallel batch collision; add a new input named
kind_name to the reusable workflow (default "kube") and update the crd-e2e job
to use this input instead of the hardcoded KIND_NAME, then pass unique values
from the callers (integration-tests-batch1 and integration-tests-batch2) via the
with: block (e.g., kind_name: kube-batch1 and kind_name: kube-batch2) so each
batch uses a distinct Kind cluster name.

@olebel
Copy link
Copy Markdown
Author

olebel commented Apr 16, 2026

At this point, longest job ( Integration tests batch ) composed of 2 parts:

  • ~12 minutes setting up environment, with 8min to configure Kind cluster
  • ~10 minutes to run actual tests

k8s configuration breakdown:

Phase Duration %
Go binary compilation (goreleaser, 4 variants) ~3m 27s 40%
Docker image building (4 images) ~2m 40s 31%
Ingress-nginx rollout wait ~57s 11%
goreleaser apt-get inside container ~40s 8%
wasmvm downloads ~29s 6%
Node image pull (ghcr.io/akash-network/node) ~9s 1%
CRDs, MetalLB, kustomize applies ~7s 1%

Total: 8m 34s. 71% of the time is spent on Go binary compilation and Docker image building, with the remaining 29% split across cluster setup, dependency downloads, and rollout waits.

@olebel
Copy link
Copy Markdown
Author

olebel commented Apr 16, 2026

Other notes:

  • some build jobs executes as root. It produces files that's cant be ovewritten as runner user, so we keeps cleaning-up workspace each time, eliminating potential cache on self-hosted runners.
  • build job keeps downloading artifacts along the way - go cache, go dependencies, wasmvm libs. This ads 10-60 seconds whenever appears and significantly contributes to total build time at this point
  • jobs building provider binaries/docker images independently. Moving it to parent job with caching will not save time as it runs in parallel. Still doesn't feel right.
  • dozens of make files and it's cross-usage in CI makes it hard to optimize build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant