fix: enforce concurrency limits for data operations by jakharmonika364 · Pull Request #5754 · fluid-cloudnative/fluid

jakharmonika364 · 2026-04-01T08:46:28Z

Ⅰ. Describe what this PR does

Ensures that data operations (specifically DataLoad) strictly respect their ParallelTaskNumber limit.

Previously, SetDataOperationInProgress would always append new operation names to a comma-separated string, effectively allowing multiple tasks to run on the same dataset simultaneously. This PR introduces a CanStartDataOperation check in the common locking logic (operation_lock.go) to prevent new tasks from registering if the parallel limit (which is 1 for DataLoad/DataMigrate) has already been reached.

Ⅱ. Does this pull request fix one issue?

fixes #1138

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Added unit tests in api/v1alpha1/dataset_types_test.go to cover:

Successful lock acquisition on empty datasets.
Correct blocking when ParallelTaskNumber is 1 and another operation is present.
Support for parallel limits > 1.
Handling of re-entrant sync cycles (avoiding self-blocking).

Ⅳ. Describe how to verify it

Create a DataLoad task for a dataset.
While the first task is Executing, create a second DataLoad for the same dataset.
Verify that the second task remains Pending and the controller logs indicate the dataset is already performing an operation.
Once the first task finishes, the second task should automatically proceed.

Ⅴ. Special notes for reviews

The implementation is generic and applies to all operations using the OperationInterface, ensuring consistent locking behavior across the project.

gemini-code-assist

Code Review

This pull request introduces a concurrency control mechanism for data operations by adding a CanStartDataOperation method to the Dataset type. This method checks if a new operation can be initiated based on a maximum parallel task limit and the current state of operations. The changes include comprehensive unit tests for this new logic and its integration into the operation locking flow. The review feedback suggests a performance optimization by moving the concurrency check before the resource-intensive DeepCopy operation and recommends enhancing the error message to include the specific concurrency limit for better observability.

gemini-code-assist · 2026-04-01T08:52:17Z

pkg/ddc/base/operation_lock.go

 		datasetToUpdate := dataset.DeepCopy()
+
+		if !datasetToUpdate.CanStartDataOperation(operationTypeName, operation.GetParallelTaskNumber(), dataOpKey) {
+			return fmt.Errorf("the dataset %s is already in %s, please wait", targetDataset.Name, operationTypeName)
+		}


There are two improvement opportunities here:

Performance Optimization: Move the CanStartDataOperation check before calling DeepCopy(). DeepCopy() is an expensive operation, and it is unnecessary to perform it if the concurrency limit has already been reached and the operation cannot start.

Error Message Clarity: The error message is a bit generic. Since this logic now supports ParallelTaskNumber > 1, it would be more helpful to include the limit in the error message to assist with troubleshooting.

Note: CanStartDataOperation is a read-only check on the Status field, so it is safe to call on the original dataset object.

Suggested change

datasetToUpdate := dataset.DeepCopy()

if !datasetToUpdate.CanStartDataOperation(operationTypeName, operation.GetParallelTaskNumber(), dataOpKey) {

return fmt.Errorf("the dataset %s is already in %s, please wait", targetDataset.Name, operationTypeName)

}

if !dataset.CanStartDataOperation(operationTypeName, operation.GetParallelTaskNumber(), dataOpKey) {

return fmt.Errorf("the dataset %s has reached the maximum number of parallel %s operations (limit: %d), please wait", targetDataset.Name, operationTypeName, operation.GetParallelTaskNumber())

}

datasetToUpdate := dataset.DeepCopy()

codecov · 2026-04-01T08:56:48Z

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.22%. Comparing base (965f661) to head (7abe7f0).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/ddc/base/operation_lock.go	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5754      +/-   ##
==========================================
- Coverage   61.22%   61.22%   -0.01%     
==========================================
  Files         444      444              
  Lines       30557    30561       +4     
==========================================
  Hits        18710    18710              
- Misses      10307    10311       +4     
  Partials     1540     1540

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR tightens dataset-level locking for data operations so that operations (notably DataLoad) respect a configured parallelism limit when registering themselves in Dataset.Status.OperationRef, preventing unbounded concurrent operations on the same dataset.

Changes:

Add a gate (CanStartDataOperation) to check whether a new operation can be registered given the current OperationRef entries and a max-parallel limit.
Enforce the gate inside the common lock acquisition path (SetDataOperationInTargetDataset) before updating dataset status.
Add unit tests covering empty state, re-entrant behavior, and max-parallel enforcement.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
pkg/ddc/base/operation_lock.go	Adds concurrency gating before setting `OperationRef` during lock acquisition.
api/v1alpha1/dataset_types.go	Introduces `Dataset.CanStartDataOperation` to validate whether an operation can start given current refs and a limit.
api/v1alpha1/dataset_types_test.go	Adds unit tests for the new `CanStartDataOperation` behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T08:59:53Z

pkg/ddc/base/operation_lock.go

+
 		// set current data operation in the target dataset
 		datasetToUpdate := dataset.DeepCopy()


The lock limit is derived from operation.GetParallelTaskNumber(). For DataMigrate this returns Spec.Parallelism (parallel workers within a single DataMigrate), so a DataMigrate with Parallelism>1 would allow multiple concurrent DataMigrate CRs on the same Dataset, contradicting the PR description that DataMigrate is limited to 1. Consider separating “max concurrent operations per dataset” from “intra-operation parallelism” (e.g., a dedicated interface method or a hard-coded 1 for DataMigrate).

Copilot · 2026-04-01T08:59:53Z

pkg/ddc/base/operation_lock.go

+
 		// set current data operation in the target dataset
 		datasetToUpdate := dataset.DeepCopy()


Returning a plain fmt.Errorf here causes SetDataOperationInTargetDataset to log at error level on every retry (and the reconciler requeues periodically), even though lock contention is an expected state when parallelism is exceeded. Consider using a typed/sentinel error so callers can log at Info/Debug and/or emit a clearer event, and include the dataset namespace/current OperationRef in the message to help users diagnose what is blocking.

Signed-off-by: Monika Jhakar <jakharmonika364@gmail.com> Signed-off-by: Monika Jakhar <jakharmonika364@gmail.com>

sonarqubecloud · 2026-04-01T09:25:28Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

fluid-e2e-bot · 2026-04-03T07:06:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yangyuliufeng for approval by writing /assign @yangyuliufeng in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fluid-e2e-bot · 2026-04-03T07:06:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yangyuliufeng for approval by writing /assign @yangyuliufeng in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fluid-e2e-bot · 2026-04-03T07:06:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yangyuliufeng for approval by writing /assign @yangyuliufeng in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fluid-e2e-bot · 2026-04-03T07:08:11Z

Hi @jakharmonika364. Thanks for your PR.

I'm waiting for a fluid-cloudnative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

cheyang requested review from TrafalgarZZZ and Copilot April 1, 2026 08:54

Copilot started reviewing on behalf of cheyang April 1, 2026 08:54 View session

jakharmonika364 force-pushed the fix/concurrency-lock branch from 8112299 to 2a18402 Compare April 1, 2026 08:57

Copilot AI reviewed Apr 1, 2026

View reviewed changes

fix: enforce concurrency limits for data operations

7abe7f0

Signed-off-by: Monika Jhakar <jakharmonika364@gmail.com> Signed-off-by: Monika Jakhar <jakharmonika364@gmail.com>

jakharmonika364 force-pushed the fix/concurrency-lock branch from 2a18402 to 7abe7f0 Compare April 1, 2026 09:24

fluid-e2e-bot bot added the needs-ok-to-test label Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enforce concurrency limits for data operations#5754

fix: enforce concurrency limits for data operations#5754
jakharmonika364 wants to merge 1 commit intofluid-cloudnative:masterfrom
jakharmonika364:fix/concurrency-lock

jakharmonika364 commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

codecov bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

sonarqubecloud bot commented Apr 1, 2026

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		// set current data operation in the target dataset
		datasetToUpdate := dataset.DeepCopy()

Conversation

jakharmonika364 commented Apr 1, 2026

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Apr 1, 2026

Quality Gate passed

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

fluid-e2e-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Apr 1, 2026 •

edited

Loading