Skip to content

[feat] integrate dynamicemb table fusion (wheel 20260407.97b80bf)#466

Open
tiankongdeguiji wants to merge 2 commits intoalibaba:masterfrom
tiankongdeguiji:feat/dynamicemb-table-fusion
Open

[feat] integrate dynamicemb table fusion (wheel 20260407.97b80bf)#466
tiankongdeguiji wants to merge 2 commits intoalibaba:masterfrom
tiankongdeguiji:feat/dynamicemb-table-fusion

Conversation

@tiankongdeguiji
Copy link
Copy Markdown
Collaborator

Summary

  • Bump dynamicemb wheel to the fused-storage build from NVIDIA recsys-examples PR #343 (commit 97b80bf). Multiple dynamicemb tables in the same TBE group are now backed by a single fused DynamicEmbStorage / cache / MultiTableKVCounter automatically.
  • Adapt tzrec/utils/dynamicemb_util.py to the new API: the upstream _get_dynamicemb_options_per_table became a pass-through validator, so _to_sharding_plan now populates dim, per-shard max_capacity, embedding_dtype, and index_type directly. Remove the obsolete num_aligned_embedding_per_rank alignment shim and the matching monkey-patch — the field no longer exists on DynamicEmbTableOptions.
  • Expose two new proto knobs on DynamicEmbedding:
    • bucket_capacity (default 128) — tunes hash table load factor vs. probe cost.
    • score_strategy = "NO_EVICTION" — keeps table capacity fixed once full, pairs with init_capacity_per_rank for growable tables.
  • Doc updates for the wheel version and the new options; note that table fusion is automatic within a FeatureGroup.

Test plan

  • python tzrec/tests/run.py for test_multi_tower_din_with_dynamicemb_train_eval — trains + evals a DIN model with dynamicemb features end-to-end on 2 GPUs.
  • python -m unittest tzrec.tools.dynamicemb.create_dynamicemb_init_ckpt_test — verifies the init-ckpt workflow still produces a valid checkpoint that the trainer can consume.
  • pre-commit run --files … — ruff, ruff-format, license header, mdformat all green.
  • Smoke-tested build_dynamicemb_constraints with bucket_capacity=256 and score_strategy="NO_EVICTION" against the new wheel (yields a DynamicEmbTableOptions with the configured values).

🤖 Generated with Claude Code

tiankongdeguiji and others added 2 commits April 7, 2026 14:45
Upgrades the dynamicemb wheel to the fused-storage build from NVIDIA
recsys-examples PR alibaba#343 and adapts tzrec's shim layer to the new API.
Multiple dynamicemb tables in the same TBE group are now backed by a
single fused storage/cache/KV-counter automatically.

The upstream `_get_dynamicemb_options_per_table` is now a pass-through
validator, so tzrec's `_to_sharding_plan` populates `dim`,
`max_capacity` (per-shard), `embedding_dtype`, and `index_type`
directly. The obsolete `num_aligned_embedding_per_rank` shim and the
matching monkey-patch are removed.

Also exposes two new proto knobs on `DynamicEmbedding`:

- `bucket_capacity` (default 128) — tune hash table load factor
- `score_strategy = "NO_EVICTION"` — keep table capacity fixed once full

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant