Default to Balanced durability mode in cluster mode#4885
Conversation
4c9d612 to
28f67e2
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28f67e2693
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| partition_replication: Some( | ||
| partition_table | ||
| .replication_property(&Metadata::with_current(|m| m.nodes_config_ref())) | ||
| .into(), |
There was a problem hiding this comment.
Preserve Everywhere on configuration reads
For clusters that still have legacy PartitionReplication::Everywhere metadata, this now returns a concrete replication property instead of omitting the field. If a client reads the cluster configuration and writes it back unchanged (for example while changing only bifrost_provider), update_cluster_configuration treats any Some value that is not already Limit(current) as a request to set Limit (crates/admin/src/cluster_controller/service.rs:699-701); previously the protobuf conversion encoded Everywhere as None (crates/types/src/protobuf.rs:134-140), which preserved the legacy dynamic behavior. That round-trip freezes replication to the current worker count, so later added workers are no longer automatically included as they were under Everywhere.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
We really want to get rid of the legacy everywhere.
When attempting to run a cluster without a snapshot repository we now default to using a balanced durability mode. We decide whether we are in a cluster-mode or not based on the number of workers in nodes configuration as well as if partition replication is `> 1`. This means that we will automatically disable trimming and `WARN` the user every minute that they need to configure a snapshot store. This is safer than the previous default which would keep trimming enabled (to save space and to reduce risk of historical nodeset decay) but would mean that partitions will fail to move to other nodes due to absence of the snapshot. Bonus: This takes a couple of small steps in the direction of fully removing the legacy "everywhere" replication from partition table. A todo is left for a future proactive migration for existing clusters with old partition table metadata. Fixes #4824
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for changing the default DurabilityMode from ReplicaSetOnly to Balanced @AhmedSoliman. LGTM. +1 for merging.
When attempting to run a cluster without a snapshot repository we now default to using a balanced durability mode.
We decide whether we are in a cluster-mode or not based on the number of workers in nodes configuration as well as
if partition replication is
> 1. This means that we will automatically disable trimming andWARNthe user everyminute that they need to configure a snapshot store. This is safer than the previous default which would keep trimming
enabled (to save space and to reduce risk of historical nodeset decay) but would mean that partitions will fail to
move to other nodes due to absence of the snapshot.
Bonus: This takes a couple of small steps in the direction of fully removing the legacy "everywhere" replication from
partition table. A todo is left for a future proactive migration for existing clusters with old partition table metadata.
Fixes #4824
Stack created with Sapling. Best reviewed with ReviewStack.