Skip to content

HDDS-14989. Delay follower SCM DN server start until Ratis log catch-up#10059

Open
xichen01 wants to merge 1 commit intoapache:masterfrom
xichen01:HDDS-14989
Open

HDDS-14989. Delay follower SCM DN server start until Ratis log catch-up#10059
xichen01 wants to merge 1 commit intoapache:masterfrom
xichen01:HDDS-14989

Conversation

@xichen01
Copy link
Copy Markdown
Contributor

@xichen01 xichen01 commented Apr 9, 2026

What changes were proposed in this pull request?

Context

Fixed a bug where reading the key could result in a NO_REPLICA_FOUND error during SCM restart and leader transfer.
For details:
https://issues.apache.org/jira/browse/HDDS-14989
Or can see testFollowerCatchupAfterContainerClose for detailed reproduction way

Fix

  • When the Follower SCM starts, it starts the DatanodeProtocolServer to receive FCR and ICR from the Datanode after catching up with the leader's committed log entries.
  • Only allow the leader SCM to update the container via Ratis by executing updateContainerState.

What is the link to the Apache JIRA'

https://issues.apache.org/jira/browse/HDDS-14989

How was this patch tested?

new test

@adoroszlai adoroszlai requested a review from szetszwo April 9, 2026 18:29
@ivandika3 ivandika3 requested a review from sumitagrawl April 10, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant