fix(library_tools): duplicate legacy library content synchronously to avoid race#38316
fix(library_tools): duplicate legacy library content synchronously to avoid race#38316kingoftech-v01 wants to merge 1 commit intoopenedx:masterfrom
Conversation
… avoid race LegacyLibraryToolsService.trigger_duplication enqueued duplicate_children via Celery .delay(), but its only caller runs inside store.bulk_operations() in contentstore.utils.duplicate_block. With a real (non-eager) worker, the Celery task executes in another process before the bulk op commits, so store.get_item(dest_block_id) either raises ItemNotFoundError or returns a dest block with no children. The duplicated library_content block then renders the validation warning "There are no problems in the specified library of type any." Switch the call to duplicate_children.apply(kwargs=...), mirroring the identical fix applied to sync_from_library in openedx#34029 / TNL-11339. The task now runs in the calling thread and observes the in-flight bulk operation. Also fix a self-comparison typo that made the cross-learning-context guard in trigger_duplication a silent no-op. Closes openedx#36544
|
Thanks for the pull request, @kingoftech-v01! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Submit a signed contributor agreement (CLA)
If you've signed an agreement in the past, you may need to re-sign. Once you've signed the CLA, please allow 1 business day for it to be processed. 🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
Description
Duplicating a unit containing a
LegacyLibraryContentBlock(randomized content / V1 library content) produces a duplicate block with no children: "There are no problems in the specified library of type any. Select another problem type." The bug only appears in production — CI'sCELERY_ALWAYS_EAGER=Truehides it.User impact (Course Author): Duplicating a unit with randomized content now produces a functional duplicate with the same children as the source.
Root Cause
Race condition.
duplicate_block()(cms/djangoapps/contentstore/utils.py:1149-1219) wraps the whole operation instore.bulk_operations(course_key). Inside, it callsstore.create_item(...)anddest_block.studio_post_duplicate(store, source_item), which forLegacyLibraryContentBlockcallsself.get_tools().trigger_duplication(...), which fireslibrary_tasks.duplicate_children.delay(...).bulk_operationson split modulestore buffers writes thread-locally and only flushes on exit. The Celery worker runs in a separate process — itsstore.get_item(BlockUsageLocator.from_string(dest_block_id))executes before the parent request ends its bulk op, so the worker sees eitherItemNotFoundErroror an empty dest block._sync_childrenthen operates on a block with no children, and the duplicatedlibrary_contentblock ends up withlen(children) == 0.The identical race was fixed for
sync_from_libraryin commit2b47b8a379(PR #34030, TNL-11339, 2024-01-09) by switching to.apply().trigger_duplicationwas overlooked.Typo (same file, line 124): The cross-learning-context guard compares
source_blockto itself — it's always False and silently disabled. Has been broken since the original 2023-11-20 commite800ae7622.Fix
Two surgical changes in
xmodule/library_tools.py:source_block.scope_ids.usage_id.context_key != source_block.scope_ids.usage_id.context_key→source_block.scope_ids.usage_id.context_key != dest_block.scope_ids.usage_id.context_key.library_tasks.duplicate_children.delay(...)→library_tasks.duplicate_children.apply(kwargs=...)to run in-process. This mirrors thesync_from_library.apply(...)call just above and is the same fix applied to that sibling task in feat: make course -> lib import synchronous #34030.A TODO comment points at #36544 and #34029 for future readers.
Supporting Information
2b47b8a379("feat: make course -> lib import synchronous")e800ae7622(PR feat: provisionally support V2 libraries in LibraryContentBlock (randomized only) #33263, 2023-11-20)CELERY_ALWAYS_EAGER=Trueinopenedx/envs/test.pyruns tasks in-thread, hiding the bulk-op race.Testing Instructions
Randomized Content Blockpointing at that library.Three new tests in
xmodule/tests/test_library_tools.py:test_unit_trigger_duplication_does_not_enqueue_async_task— Unit: asserts.apply()called, not.delay()test_integration_trigger_duplication_inside_bulk_operations— Integration: asserts duplicate has children inside an outerbulk_operationstest_bug_36544_regression_cross_context_guard— Regression: asserts the typo fix rejects cross-course duplicationDeadline
None
Other Information
studio_post_duplicategates onis_migrated_to_v2).apply()blocks the CMS request thread for the duration of the sync, matching the already-shippedsync_from_library.apply()path. Duplicate is a user action; blocking until completion is the correct semantics.Closes #36544