Skip to content

Build: Simplify CI workflow path filters to avoid per-workflow maintenance#16302

Draft
kevinjqliu wants to merge 1 commit into
apache:mainfrom
kevinjqliu:kevinjqliu/simplify-ci-paths-filter
Draft

Build: Simplify CI workflow path filters to avoid per-workflow maintenance#16302
kevinjqliu wants to merge 1 commit into
apache:mainfrom
kevinjqliu:kevinjqliu/simplify-ci-paths-filter

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu commented May 12, 2026

Switches 6 CI workflows from paths-ignore with individual workflow file listings to paths with !.github/** glob + self-allowlist.

Before, adding/renaming/removing a workflow required updating the ignore list in all 6 CI files. Now, only the workflow's own file is allowlisted and everything else under .github/ is excluded automatically.

Motivation

Every PR that adds, renames, or removes a workflow file must update the paths-ignore list in all 6 CI workflows, adding boilerplate changes unrelated to the PR's purpose. Recent examples:

Pattern

Following the pattern used by pyiceberg:

pull_request:
  paths:
  - '**'
  - '!.github/**'
  - '.github/workflows/<self>.yml'
  - '!docs/**'
  # ... other exclusions
  1. ** — include all files
  2. !.github/** — exclude everything under .github/
  3. .github/workflows/<self>.yml — re-include the workflow's own file (later patterns override earlier ones)
  4. Remaining ! entries exclude docs, metadata, and unrelated engine modules

Testing

Wrote a validation script simulating GitHub Actions path matching for both old (paths-ignore) and new (paths) configs across all 6 workflows. Tested 53 representative file paths covering:

  • Source code (core, api, data, parquet, orc, aws, build files)
  • Engine-specific modules (spark, flink, kafka-connect, mr, arrow, delta-lake, hive)
  • Own workflow file vs other workflow files
  • .github/ files not in the old explicit lists (new workflows, dependabot, CODEOWNERS)
  • Docs, metadata, and config files (.gitignore, README, LICENSE, etc.)

Results: All paths match old behavior except .github/ files that were missing from the old ignore lists (e.g., brand-new-workflow.yml, pr-title-check.yml, dependabot.yml, CODEOWNERS). These correctly change from TRIGGER → SKIP — the old behavior was a maintenance bug where unlisted files would needlessly trigger all CI workflows.

@github-actions github-actions Bot added the INFRA label May 12, 2026
@kevinjqliu kevinjqliu force-pushed the kevinjqliu/simplify-ci-paths-filter branch from 1b092b7 to b7d5778 Compare May 12, 2026 16:35
@kevinjqliu kevinjqliu requested a review from manuzhang May 12, 2026 16:38
@kevinjqliu kevinjqliu marked this pull request as draft May 12, 2026 17:06
@kevinjqliu
Copy link
Copy Markdown
Contributor Author

should also include .github/workflows/cve-scan.yml when its merged

https://github.com/apache/iceberg/pull/16287/files#diff-b9a3df5364290d7430a379df61028521a5934e6a5f93b1530a2802f1ed9d7e26R30

@manuzhang
Copy link
Copy Markdown
Member

manuzhang commented May 13, 2026

@kevinjqliu I had a similar PR before but @amogh-jahagirdar had some general concerns on this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants