UN-3479 [FEAT] CLI restructure with user-group cloning + share-state replication#19
UN-3479 [FEAT] CLI restructure with user-group cloning + share-state replication#19chandrasekharan-zipstack wants to merge 3 commits into
Conversation
Replace the 'unstract-clone' console script with a top-level 'unstract'
Click group ('unstract clone' is now the canonical invocation), promote
click/rich from the 'clone' extra into core dependencies (the extra is
removed), and document the CLI — usage, env promotion use case, main
options and exit codes — in the README.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
New 'group' phase (runs first): clones org user groups by name with
idempotent merge — a like-named target group is always reused, never
renamed or errored — and records int-pk remaps for downstream phases.
Opt-in --clone-group-members matches members to target users by email,
bulk-adds the hits and reports the misses as warnings; service accounts
(@platform.internal) never migrate.
After each shareable resource lands (adapter, connector, workflow,
pipeline, api_deployment, custom_tool), its source share state is
mirrored via POST /{resource}/{id}/share/: shared_groups through the
group remap (axis omitted with a warning when the group phase is
excluded), shared_to_org as-is, shared_users via id -> email -> target
id (missing users skipped + warned). shared_users stays stripped from
create POSTs (SERVER_MANAGED). Users/groups listings are memoised per
run; dry-run never POSTs. Reports gain a non-fatal warnings channel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Empty Prompt Studio projects cannot be exported (backend guard), so they have no source registry entry and no workflow references. The phase previously republished unconditionally, turning these into counted failures on every run. Now the source registry lookup runs first and the export + remap are skipped when no entry exists. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
| Filename | Overview |
|---|---|
| src/unstract/clone/sharing.py | New module for share-state replication; cache helper has a race where _FETCH_FAILED from a failing thread can permanently shadow a successful concurrent result |
| src/unstract/clone/client.py | New user/group/share API methods added; list_groups lacks the or{} None guard present in list_users, could raise AttributeError on empty-body response |
| src/unstract/clone/phases/group.py | New GroupPhase: idempotent name-matched group cloning with optional email-matched member migration; dry-run, adopt, and service-account exclusion all handled correctly |
| src/unstract/clone/phases/base.py | Added share_path_template class attribute and apply_share helper that dispatches to sharing.apply_share_state; clean delegation pattern |
| src/unstract/clone/phases/custom_tool.py | Added empty-project registry guard and share replication via apply_share with src_detail_fn |
| src/unstract/clone/context.py | Added clone_group_members option, share_cache dict, and share_cache_lock; threading.Lock in a dataclass field is appropriate here |
| src/unstract/clone/report.py | Added warnings list to PhaseResult and _render_warnings_summary; warnings surface in plain, rich, and JSON outputs correctly |
| src/unstract/cli.py | New top-level unstract Click group registering clone_cmd; clean entry-point split from python -m unstract.clone |
| pyproject.toml | click and rich promoted from optional [clone] extra to core dependencies; entry point renamed from unstract-clone to unstract |
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
src/unstract/clone/sharing.py:45-58
**`_FETCH_FAILED` sentinel can permanently shadow a successful response**
The code comment says "one result wins — the listings are read-only so that's harmless", but a `_FETCH_FAILED` value winning is not harmless: once it is cached via `setdefault`, every subsequent call for that key returns `None` and all share-state replication for the entire run is silently skipped (only warnings, no errors). The race is: Thread A makes the HTTP call and gets an exception → `value = _FETCH_FAILED`. Thread B makes the HTTP call at the same moment and succeeds → `value = valid dict`. If Thread A's `setdefault` executes first, the failure is cached permanently and Thread B's valid result is thrown away.
The fix is to prefer a successful value over `_FETCH_FAILED` when updating the cache under the lock.
```suggestion
with ctx.share_cache_lock:
# Prefer a successful result over a cached failure if we raced.
if key not in ctx.share_cache or (
ctx.share_cache[key] is _FETCH_FAILED and value is not _FETCH_FAILED
):
ctx.share_cache[key] = value
return ctx.share_cache[key]
```
### Issue 2 of 2
src/unstract/clone/client.py:125-126
Use the same `or {}` guard as `list_users` to avoid an `AttributeError` when the endpoint returns `None`.
```suggestion
result = self._request("GET", "groups/")
return result if isinstance(result, list) else (result or {}).get("results", [])
```
Reviews (1): Last reviewed commit: "fix: skip registry republish for tools n..." | Re-trigger Greptile
| def _cached(ctx: CloneContext, key: str, build: Callable[[], Any]) -> Any: | ||
| with ctx.share_cache_lock: | ||
| if key in ctx.share_cache: | ||
| return ctx.share_cache[key] | ||
| # Build outside the lock (HTTP call); worst case two threads race and | ||
| # one result wins — the listings are read-only so that's harmless. | ||
| try: | ||
| value = build() | ||
| except Exception as e: | ||
| logger.warning("share replication: %s listing failed: %s", key, e) | ||
| value = _FETCH_FAILED | ||
| with ctx.share_cache_lock: | ||
| ctx.share_cache.setdefault(key, value) | ||
| return ctx.share_cache[key] |
There was a problem hiding this comment.
_FETCH_FAILED sentinel can permanently shadow a successful response
The code comment says "one result wins — the listings are read-only so that's harmless", but a _FETCH_FAILED value winning is not harmless: once it is cached via setdefault, every subsequent call for that key returns None and all share-state replication for the entire run is silently skipped (only warnings, no errors). The race is: Thread A makes the HTTP call and gets an exception → value = _FETCH_FAILED. Thread B makes the HTTP call at the same moment and succeeds → value = valid dict. If Thread A's setdefault executes first, the failure is cached permanently and Thread B's valid result is thrown away.
The fix is to prefer a successful value over _FETCH_FAILED when updating the cache under the lock.
| def _cached(ctx: CloneContext, key: str, build: Callable[[], Any]) -> Any: | |
| with ctx.share_cache_lock: | |
| if key in ctx.share_cache: | |
| return ctx.share_cache[key] | |
| # Build outside the lock (HTTP call); worst case two threads race and | |
| # one result wins — the listings are read-only so that's harmless. | |
| try: | |
| value = build() | |
| except Exception as e: | |
| logger.warning("share replication: %s listing failed: %s", key, e) | |
| value = _FETCH_FAILED | |
| with ctx.share_cache_lock: | |
| ctx.share_cache.setdefault(key, value) | |
| return ctx.share_cache[key] | |
| with ctx.share_cache_lock: | |
| # Prefer a successful result over a cached failure if we raced. | |
| if key not in ctx.share_cache or ( | |
| ctx.share_cache[key] is _FETCH_FAILED and value is not _FETCH_FAILED | |
| ): | |
| ctx.share_cache[key] = value | |
| return ctx.share_cache[key] |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/unstract/clone/sharing.py
Line: 45-58
Comment:
**`_FETCH_FAILED` sentinel can permanently shadow a successful response**
The code comment says "one result wins — the listings are read-only so that's harmless", but a `_FETCH_FAILED` value winning is not harmless: once it is cached via `setdefault`, every subsequent call for that key returns `None` and all share-state replication for the entire run is silently skipped (only warnings, no errors). The race is: Thread A makes the HTTP call and gets an exception → `value = _FETCH_FAILED`. Thread B makes the HTTP call at the same moment and succeeds → `value = valid dict`. If Thread A's `setdefault` executes first, the failure is cached permanently and Thread B's valid result is thrown away.
The fix is to prefer a successful value over `_FETCH_FAILED` when updating the cache under the lock.
```suggestion
with ctx.share_cache_lock:
# Prefer a successful result over a cached failure if we raced.
if key not in ctx.share_cache or (
ctx.share_cache[key] is _FETCH_FAILED and value is not _FETCH_FAILED
):
ctx.share_cache[key] = value
return ctx.share_cache[key]
```
How can I resolve this? If you propose a fix, please make it concise.| result = self._request("GET", "groups/") | ||
| return result if isinstance(result, list) else result.get("results", []) |
There was a problem hiding this comment.
Use the same
or {} guard as list_users to avoid an AttributeError when the endpoint returns None.
| result = self._request("GET", "groups/") | |
| return result if isinstance(result, list) else result.get("results", []) | |
| result = self._request("GET", "groups/") | |
| return result if isinstance(result, list) else (result or {}).get("results", []) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/unstract/clone/client.py
Line: 125-126
Comment:
Use the same `or {}` guard as `list_users` to avoid an `AttributeError` when the endpoint returns `None`.
```suggestion
result = self._request("GET", "groups/")
return result if isinstance(result, list) else (result or {}).get("results", [])
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Summary
unstractClick group withcloneas first subcommand;click/richmoved from optional extra to core (fixes broken-by-defaultImportError)groupphase (runs first) clones org user groups by name with reuse-on-match; optional--clone-group-membersflag (email-matched, missing users skipped and reported)is_service_accountfield with email-suffix fallback for older backendsTest Plan
uv run pytest→ 184 passed (167 pre-existing + 17 new)Dependencies
feat/org-migration-platform-api-gaps(service-account group management)README
unstract cloneusage and optional--clone-group-membersflag🤖 Generated with Claude Code