Skip to content

Extend TLS auto-reload to CPM and monitoring service clients#19045

Merged
kaisecheng merged 12 commits intoelastic:mainfrom
kaisecheng:xpack-reload-certs
May 7, 2026
Merged

Extend TLS auto-reload to CPM and monitoring service clients#19045
kaisecheng merged 12 commits intoelastic:mainfrom
kaisecheng:xpack-reload-certs

Conversation

@kaisecheng
Copy link
Copy Markdown
Contributor

@kaisecheng kaisecheng commented Apr 24, 2026

Release notes

Logstash now automatically rebuilds the Elasticsearch clients used by Central Pipeline Management (CPM) and legacy internal monitoring when their TLS certificate files change on disk. This removes the need to restart Logstash after certificate rotation for those services. Requires ssl.reload.automatic: true and config.reload.automatic: true, which is automatically enabled when xpack.management.enabled: true.

What does this PR do?

Extends the SSL file watch infrastructure introduced in #18978 to the x-pack background service clients that live outside the pipeline lifecycle:

  • ElasticsearchSource (CPM's pipeline-config fetcher) rebuilds its ES client when CPM TLS files change.
  • LicenseReader in both LicenseManager instances (CPM's and monitoring's) rebuilds the license-check client on TLS rotation.

Tracking IDs

SslFileTracker keeps one set of registered paths per id. X-pack service ids use a leading . to keep them disjoint from user pipeline ids:

Tracking ID Owner Cert paths from Reload mechanism
:".cpm" ElasticsearchSource (CPM config fetcher) xpack.management.elasticsearch.ssl.* Rebuild ES fetch client (this PR)
:".cpm_license" LicenseReader inside CPM's LicenseManager xpack.management.elasticsearch.ssl.* Rebuild license-check client (this PR)
:".monitoring-logstash" The monitoring pipeline itself (elasticsearch output) xpack.monitoring.elasticsearch.ssl.* Pipeline reload via PR2 mechanism
:".monitoring_license" LicenseReader inside monitoring's LicenseManager xpack.monitoring.elasticsearch.ssl.* Rebuild license-check client (this PR)

Implementation notes

  • Moved the creation of Tracker to Runner, not Agent. CPM's ElasticsearchSource is built during :after_bootstrap_checks before Agent.new, so Agent cannot own the tracker. The exact creation site is between bootstrap_checks.each and :after_bootstrap_checks: CPM's BootstrapCheck flips config.reload.automatic = true during each, and CPM's hook builds its source on :after_bootstrap_checks. Anywhere outside that window either misreads the activation gate or constructs the source before the tracker exists.

  • LogStash::Helpers::ElasticsearchClientHolder owns one lazy-built ES client. The factory create(tracker, id) { build_client } returns either:

    • Lazy (no tracker) — memoise the factory result.
    • SslRebuildable (with tracker) — call tracker.consume_stale on every get; when stale, close the old client and rebuild before returning.

    LicenseReader and ElasticsearchSource just call holder.get. Rebuild lifecycle is invisible to callers.

Why is it important/What is the impact to the user?

Without this PR, rotating certificates used by CPM or internal monitoring forces a full Logstash restart, which interrupts pipeline processing. With this PR, certificate rotation is hot-applied by the next scheduled fetch tick, matching the user-pipeline behavior that PR2 already delivered.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • Integration tests cover 3 end-to-end scenarios
    • CPM cert rotation rebuilds both the CPM ES client and the license-reader client; subsequent pipeline-config push proves the rebuilt CPM client still connects (cpm_tls_hot_reload_spec.rb)
    • Monitoring cert rotation with config.reload.automatic: true rebuilds the license-reader client; monitoring docs continue to ship (monitoring_tls_hot_reload_spec.rb)
    • Monitoring cert rotation with config.reload.automatic: false (default) is silently ignored; no error logged (monitoring_tls_hot_reload_spec.rb)

How to test this PR locally

Manual

  1. Start Logstash with CPM enabled and config.reload.automatic: true + ssl.reload.automatic: true
  2. Confirm pipelines fetch from ES via TLS
  3. Trigger a content change on the CA bundle referenced by xpack.management.elasticsearch.ssl.certificate_authority. The tracker hashes regular file content with SHA-256, so any byte change is detected:
    • Quick smoke test: printf '\n' >> /path/to/your/ca.crt. Adds a trailing newline outside the PEM block, so ES connectivity is unaffected.
    • Realistic rotation: generate a second CA with openssl req -x509 ... and append its PEM to the bundle (the connection survives because both the old and the new CA are trusted). See cpm_tls_hot_reload_spec.rb for the full pattern.
  4. Set log.level: debug (the rebuild events are logged at DEBUG). Within ~30s expect two rebuilt elasticsearch client `<id>` on certificate change log lines (one from CPM, one from LicenseReader), no errors.

Related issues

@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 24, 2026

This pull request does not have a backport label. Could you fix it @kaisecheng? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

Move tracker creation from Agent to Runner so x-pack services (CPM and
monitoring) receive it via their constructors at build time. Removes the
post-construction wiring step that previously attached the tracker after
each source was created.
- CPM: CA cert rotation rebuilds both ElasticsearchSource and LicenseReader clients, and the new connection remains functional
- monitoring (auto-reload on): CA cert rotation rebuilds the license reader client and monitoring docs continue to accumulate
- monitoring (auto-reload off): cert rotation is ignored and no error is raised
@kaisecheng
Copy link
Copy Markdown
Contributor Author

run exhaustive tests

Copy link
Copy Markdown
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks like this does what it intends to do.

I would prefer to encapsulate the complexity of rebuilding the client so that the callers can instantiate a client-holder with or without a tracker, and then can use that client-holder to get a current client without needing to manage the client's lifecycle.

I have opened kaisecheng#1 as an alternative.

Comment thread x-pack/lib/helpers/ssl_rebuildable.rb Outdated
Comment on lines +28 to +30
def client
@client ||= @client_factory.call
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pull the complexity into this method so that the caller doesn't need to concern itself with sending #maybe_rebuild?

One approach is kaisecheng#1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted in 0d77add. Thanks for the cleaner encapsulation.

For symlink paths, every client invocation now calls tracker.consume_stale to refresh the latest mtime. This doubles the calls vs my prior maybe_rebuild in the happy path. For the retry case (retry_handler.try(10.times, ...)), worst case multiplies by 10. For regular files this is negligible. Failure retries are slow by nature anyway, so I'm not too concerned about the doubled calls

Comment thread logstash-core/lib/logstash/ssl_file_tracker.rb
Copy link
Copy Markdown
Member

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on tests.

Comment thread logstash-core/lib/logstash/ssl_file_tracker.rb
Comment thread logstash-core/spec/logstash/runner_spec.rb Outdated
Comment thread logstash-core/spec/logstash/ssl_file_tracker_spec.rb
kaisecheng and others added 4 commits April 30, 2026 12:09
Introduces a new `LogStash::Helpers::ElasticsearchClientHolder` that callers
can use to get a current elasticsearch client WITHOUT needing to manage the
state of that client.

Provides two implementations, one of which is selected depending on whether
a tracker is provided:

 - `Lazy`: simply creates the client on first use and returns the same
   client instance for all subsequent calls
 - `SslRebuildable`: similarly lazily creates the client on first use, but
   rebuilds clients that have been marked stale by the provided tracker
   before returning them to the caller.
@kaisecheng kaisecheng requested review from andsel and yaauie April 30, 2026 14:39
Copy link
Copy Markdown
Member

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a nitpick on test method name, works as expected.

LGTM 👏

Comment thread logstash-core/spec/logstash/runner_spec.rb Outdated
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
@elasticmachine
Copy link
Copy Markdown

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @kaisecheng

@kaisecheng
Copy link
Copy Markdown
Contributor Author

run exhaustive tests

@kaisecheng kaisecheng merged commit c398724 into elastic:main May 7, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

License checker supports cert reload

4 participants