Skip to content

feat: Add reuse_connections profile option#670

Open
dlouseiro wants to merge 5 commits into
ClickHouse:mainfrom
dlouseiro:dlouseiro/reuse-connections-option
Open

feat: Add reuse_connections profile option#670
dlouseiro wants to merge 5 commits into
ClickHouse:mainfrom
dlouseiro:dlouseiro/reuse-connections-option

Conversation

@dlouseiro

@dlouseiro dlouseiro commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Resolves #669.

Adds a profile-level boolean reuse_connections (default true, preserving current behaviour). When set to false, dbt closes the underlying connection at the end of each model so the next model opens a fresh TCP connection — useful for multi-replica ClickHouse Cloud services where the load balancer routes by connection and a long-lived pool would otherwise pin a dbt run to a single replica.

Mirrors the equivalent option on dbt-snowflake. The change gates the existing no-op release() on the new flag — when false, it falls through to the base SQLConnectionManager.release(), which calls self.close(conn).

Performance: reduced per-model connection startup overhead

With reuse_connections: false a client is created per model, so per-connection setup cost matters much more. Three optimisations were applied:

  • _ensure_database cached process-wide — the EXISTS DATABASE probe now runs at most once per process (invalidated by database_dropped), instead of once per client creation.
  • Lightweight-delete capability probe removed — lw deletes are GA on all supported ClickHouse versions (≥23.3, well below the CI minimum of 25.3); has_lw_deletes is always True and no server round-trip is needed.
  • allow_nondeterministic_mutations via connection settings — applied at client construction via _conn_settings when use_lw_deletes: true, instead of a per-client SET query.

Bug fix: delete_insert/microbatch guard

The validate_incremental_strategy guard was checking has_lw_deletes (server capability, now always True) instead of use_lw_deletes (profile opt-in). This meant a model with incremental_strategy: delete+insert could silently bypass validation when the profile had use_lw_deletes: false, and then fail at runtime on replicated tables because allow_nondeterministic_mutations was not injected. Fixed to check use_lw_deletes.

Verified

  • Unit tests for release() behaviour, ND mutation settings, DB existence caching.
  • Smoke-tested end-to-end against a local ClickHouse instance with reuse_connections: false.

Checklist

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

Note

Medium Risk
Changes connection lifecycle and removes runtime lightweight-delete capability probing, which could affect edge-case servers or misconfigured profiles; incremental validation behavior is stricter for delete_insert/microbatch without use_lw_deletes.

Overview
Adds profile flag reuse_connections (default true). When false, ClickHouseConnectionManager.release() delegates to the SQL base class so connections close after each model and new ones can land on different ClickHouse Cloud replicas behind sticky load balancing.

To keep per-model reconnects cheap, EXISTS DATABASE is cached process-wide (invalidated on database_dropped), lightweight-delete server probes and runtime SET queries are removed (lw deletes treated as always available on supported versions), and allow_nondeterministic_mutations is injected via connection settings when use_lw_deletes: true.

Bug fix: validate_incremental_strategy now gates delete_insert / microbatch on use_lw_deletes instead of has_lw_deletes, so models cannot pass validation without the profile opt-in. Related lw-delete error strings were removed from errors.py.

Reviewed by Cursor Bugbot for commit d627613. Bugbot is set up for automated code reviews on this repo. Configure here.

@koletzilla

Copy link
Copy Markdown
Contributor

Hi @dlouseiro ! Thanks for the PR.

I'm checking it internally to see which alternatives we can have to these problems, but tbh seems like your's is the way to go.

I'm a bit worried with the overhead of having to recreate the whole client per each node. Have you already tested this in prod? How is it going?

@koletzilla

Copy link
Copy Markdown
Contributor

Apart from what I mentioned, would you check if you can add to this PR:

  • Reduce the number of EXISTS DATABASE operations (_ensure_database): This is done once per client creation. Right now it's executed once per dbt thread, which was already too much, but now it will be done once per model, so the problem is now worse. Maybe we can do the same approach as in perf: Short-circuit exchange check on Shared DB engine and run whole check only in one thread #653 where the EXCHANGE is only done once and the result is cached.
  • For _check_lightweight_deletes: The lightweight check and set is not needed anymore as all our supported versions already support it, so we can just keep inside has_lw_deletes = True and use_lw_deletes = requested
  • That method also checks/set allow_nondeterministic_mutations , which is still needed in this context. Maybe we can somehow avoid checking it all the times and just do the set part 🤔

With these, the recreation of the client should be way faster. Not sure if we can do something else to improve the performance, feel free to suggest other things

Thanks!

dhtclk pushed a commit to ClickHouse/clickhouse-docs that referenced this pull request Jun 16, 2026
Adds the `reuse_connections` option (default `True`) to the dbt
profile.yml configuration table. When set to `False`, dbt closes the
underlying connection at the end of each model so the next model
opens a fresh TCP connection — useful on multi-replica ClickHouse
Cloud services where the load balancer routes by TCP connection and a
long-lived pool would otherwise pin a `dbt run` to a single replica.

Tracks ClickHouse/dbt-clickhouse#669 / ClickHouse/dbt-clickhouse#670.
@dlouseiro

Copy link
Copy Markdown
Contributor Author

Apart from what I mentioned, would you check if you can add to this PR:

  • Reduce the number of EXISTS DATABASE operations (_ensure_database): This is done once per client creation. Right now it's executed once per dbt thread, which was already too much, but now it will be done once per model, so the problem is now worse. Maybe we can do the same approach as in perf: Short-circuit exchange check on Shared DB engine and run whole check only in one thread #653 where the EXCHANGE is only done once and the result is cached.
  • For _check_lightweight_deletes: The lightweight check and set is not needed anymore as all our supported versions already support it, so we can just keep inside has_lw_deletes = True and use_lw_deletes = requested
  • That method also checks/set allow_nondeterministic_mutations , which is still needed in this context. Maybe we can somehow avoid checking it all the times and just do the set part 🤔

With these, the recreation of the client should be way faster. Not sure if we can do something else to improve the performance, feel free to suggest other things

Thanks!

Thank you @koletzilla . I'll have a look at your suggestions.

@dlouseiro

Copy link
Copy Markdown
Contributor Author

Hi @dlouseiro ! Thanks for the PR.

I'm checking it internally to see which alternatives we can have to these problems, but tbh seems like your's is the way to go.

I'm a bit worried with the overhead of having to recreate the whole client per each node. Have you already tested this in prod? How is it going?

Of course, this is mostly a proposal of a solution for the issue I created. Might not be the best, but as it was not too much code I decided to add it.

We did not yet test this in prod as we'd ideally have this released on the main distribution of dbt-clickhouse avoiding using my fork directly. But would love your help to test this under load maybe even within this PR somehow, but ideally on a local instance of clickhouse, not in our prod instance.

dlouseiro and others added 2 commits June 18, 2026 11:32
Adds a profile-level boolean `reuse_connections` (default `true`,
preserving current behaviour). When set to `false`, dbt closes the
underlying connection at the end of each model so the next model opens
a fresh TCP connection — useful for multi-replica ClickHouse Cloud
services where the load balancer routes by connection and a long-lived
pool would otherwise pin a `dbt run` to a single replica.

Mirrors the equivalent option on dbt-snowflake. The change gates the
existing no-op `release()` on the new flag and falls through to the
base `SQLConnectionManager.release()` (which calls `self.close(conn)`,
which calls `connection.handle.close()` on the clickhouse-connect /
clickhouse-driver handle).

Closes ClickHouse#669
Addresses review on ClickHouse#670: with reuse_connections=false a client is created
per model, so the per-connection setup work now matters much more.

- Cache the EXISTS DATABASE probe process-wide (invalidated by database_dropped)
- Drop the lightweight-delete capability probe; lw deletes are GA on all
  supported ClickHouse versions (min 23.3 << CI min 25.3)
- Apply allow_nondeterministic_mutations via connection settings instead of a
  per-client query + SET

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dlouseiro dlouseiro force-pushed the dlouseiro/reuse-connections-option branch from bb2e9d2 to 03f710d Compare June 18, 2026 10:44

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 03f710d. Configure here.

Comment thread dbt/adapters/clickhouse/dbclient.py
@dlouseiro

Copy link
Copy Markdown
Contributor Author

Apart from what I mentioned, would you check if you can add to this PR:

  • Reduce the number of EXISTS DATABASE operations (_ensure_database): This is done once per client creation. Right now it's executed once per dbt thread, which was already too much, but now it will be done once per model, so the problem is now worse. Maybe we can do the same approach as in perf: Short-circuit exchange check on Shared DB engine and run whole check only in one thread #653 where the EXCHANGE is only done once and the result is cached.
  • For _check_lightweight_deletes: The lightweight check and set is not needed anymore as all our supported versions already support it, so we can just keep inside has_lw_deletes = True and use_lw_deletes = requested
  • That method also checks/set allow_nondeterministic_mutations , which is still needed in this context. Maybe we can somehow avoid checking it all the times and just do the set part 🤔

With these, the recreation of the client should be way faster. Not sure if we can do something else to improve the performance, feel free to suggest other things

Thanks!

@koletzilla I pushed a new commit to tackle your comments above.

Let me know what you think!

dlouseiro and others added 3 commits June 19, 2026 12:20
…lw_deletes

has_lw_deletes is now always True (lw deletes are GA on all supported
ClickHouse versions), so the previous guard never fired. Checking
use_lw_deletes (the profile opt-in) restores the intended behaviour:
delete_insert and microbatch require use_lw_deletes: true, which also
ensures allow_nondeterministic_mutations is injected — necessary for
DELETE with subquery predicates on replicated tables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…calls

Mirrors the pattern in test_exchange_check.py where _exchange_result is
reset by an autouse fixture. Removes the manual .clear() calls inside
individual tests so future cache tests can't accidentally skip cleanup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…y guard

  - test_connections: assert reuse_connections defaults to True
  - test_dbclient: add symmetric case for use_lw_deletes=False (has_lw_deletes
    remains True, use_lw_deletes=False)
  - test_incremental_strategy (new): nine tests for validate_incremental_strategy
    covering the bug fix — delete_insert/microbatch raise when use_lw_deletes=False,
    pass when True, and other strategies are unaffected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a reuse_connections profile option (parity with dbt-snowflake)

2 participants