FEAT: Round Robin Target by jsong468 · Pull Request #1761 · microsoft/PyRIT

jsong468 · 2026-05-19T20:57:30Z

Round Robin Target

Description and design decisions

New RoundRobinTarget class (pyrit/prompt_target/round_robin_target.py): a PromptTarget that wraps multiple inner targets and distributes requests across them using weighted round-robin selection. Intended for load-balancing across multiple deployments of the same model (e.g., Azure OpenAI endpoints in different regions).
Per-call distribution, not per-conversation: requests are distributed on every call to _send_prompt_to_target_async, not pinned to a conversation. This is safe because PyRIT's conversation history is managed at the conversation_id level in shared memory — not by the target itself. When any inner target handles a request, the base class _get_normalized_conversation_async fetches the full conversation from memory by conversation_id, appends the current message, and passes the complete history to the inner target. The inner target never needs to "remember" prior turns; it receives them in full every time. This architecture means switching inner targets mid-conversation has no effect on correctness.
Requires multi-turn + editable history: all inner targets must support supports_multi_turn and supports_editable_history. This is enforced at construction using the existing CHAT_TARGET_REQUIREMENTS validation infrastructure. These capabilities guarantee the target rebuilds its state from the provided conversation rather than relying on server-side state.
Same concrete class required: all inner targets must be the same Python class (e.g., all OpenAIChatTarget). This prevents mixing fundamentally different target types that happen to share the same interface.
Behavioral parameter consistency: inner targets must have matching underlying_model_name (with model_name fallback), temperature, and top_p. This ensures scoring results are comparable across targets. The validation uses the same (newly introduced) constants (TARGET_BEHAVIORAL_PARAMS, TARGET_BEHAVIORAL_PARAM_FALLBACKS) as the eval hash computation, so they cannot drift.
Capability intersection: the round-robin's capabilities are the intersection (lower bound) of all inner targets' capabilities. Boolean capability flags are AND-ed; modality frozensets are intersected. If the intersection of input or output modalities is empty, construction fails.
Optional integer weights: weights=[2, 1] expands into a rotation list [0, 0, 1] that cycles, sending roughly 2x traffic to the first target. Default is equal weight.
Memory entries use the round-robin's identifier: the prompt_target_identifier on request and response pieces is the RoundRobinTarget's own ComponentIdentifier. This keeps memory entries consistent — a single conversation shows one identifier throughout. The hash of the inner target that actually handled each request is recorded in prompt_metadata["inner_target_identifier"] for traceability.
Eval hash unwrap mechanism (pyrit/identifiers/evaluation_identifier.py): added unwrap_child field to ChildEvalRule. When set, the eval hash computation "sees through" wrapper targets by substituting the first inner child before applying param filtering. This ensures scorer(round_robin([t1_east, t1_west])) produces the same eval hash as scorer(t1_east), making scoring results comparable regardless of whether a round-robin was used. Applied to ScorerEvaluationIdentifier (prompt_target child) and AtomicAttackEvaluationIdentifier (objective_target child).
Why round-robin identifier on memory entries but unwrap in eval hash: these serve different purposes and operate at different layers. The prompt_target_identifier on memory entries answers "what component was responsible for this request?" which is the RoundRobinTarget, since that's what the caller passed to the normalizer or scorer. Stamping inner target identifiers would create inconsistency within a single conversation (different turns showing different identifiers) and would require overriding _get_normalized_conversation_async to mutate message pieces, adding complexity for no functional gain. The inner target that actually handled each request is still traceable via prompt_metadata["inner_target_identifier"]. The eval hash, by contrast, answers a completely different question: "are these two scorer configurations behaviorally equivalent for grouping evaluation results?" For that purpose, what matters isn't the wrapper but rather the underlying model, temperature, and top_p. The unwrap mechanism lives entirely in the eval hash computation layer and doesn't touch memory entries, identifiers, or runtime behavior. Keeping these two concerns separate means the memory layer stays simple (no hook overrides, no mutation) while the eval layer correctly groups results regardless of whether a round-robin was used.
Prompt caching trade-off: switching targets mid-conversation defeats provider-side prompt prefix caching. For multi-turn attacks like Crescendo with 5+ turns across thousands of objectives, this can significantly increase API cost compared to pinning each conversation to a single target. This is a throughput vs. cost trade-off: round-robin avoids per-endpoint rate limits at the expense of caching efficiency. Users who need cache-efficient multi-turn conversations should assign individual targets at the attack or scenario level rather than using round-robin for those workloads. Conversation-to-target pinning was intentionally not added at the target level because it would couple conversation management with pure prompt sending — a responsibility that belongs to a higher level. A user who wants one target per conversation can simply pass the target directly to the attack without a round-robin.
Concurrency safety: the only shared mutable state is self._counter (the rotation index), which is only mutated in the synchronous _next_target() method. Under Python's asyncio cooperative concurrency model, this is safe — no two coroutines can interleave within a synchronous method. Crucially, because the target is selected synchronously (as a local variable) before the await call to _send_prompt_to_target_async, even if another coroutine advances _counter while the first is waiting on the network call, the already-selected target reference cannot be affected. Not safe for multi-threaded use, consistent with the rest of PyRIT's target classes.
Minimal override surface: only _send_prompt_to_target_async and _build_identifier are overridden. No override of _get_normalized_conversation_async or set_system_prompt — the base class handles both correctly since all memory operations are keyed by conversation_id and stamped with self.get_identifier().

Tests and Documentation

Unit tests (tests/unit/prompt_target/test_round_robin_target.py): 24 tests covering:
- Construction validation: rejects < 2 targets, mixed classes, mismatched weights, zero/negative weights
- Capability intersection: boolean AND, modality intersection, empty modality rejection
- Capability requirements: rejects targets without multi-turn, rejects targets without editable history
- Round-robin selection: FIFO rotation, weighted rotation
- Delegation: _send_prompt_to_target_async delegates to correct inner target, records inner_target_identifier in metadata, round-robins across calls
- set_system_prompt: uses round-robin identifier (verified via memory lookup)
- Identifier: includes children and weights
- End-to-end: full send_prompt_async flow keeps round-robin identifier on entries
- Behavioral validation: rejects mismatched underlying_model_name, rejects mismatched temperature, accepts matching params with different endpoints, uses model_name fallback
Eval hash unwrap tests (tests/unit/identifiers/test_evaluation_identifier.py): 3 tests added:
- test_unwrap_substitutes_first_inner_child: verifies the unwrap produces the same hash as the direct target
- test_unwrap_no_op_when_child_has_no_matching_subchild: verifies non-wrapper targets are unaffected
- test_scorer_eval_hash_matches_with_and_without_round_robin: end-to-end ScorerEvaluationIdentifier equivalence
Documentation notebook (doc/code/targets/round_robin_target.ipynb and round_robin_target.py): 5 sections demonstrating:
- Basic usage with alternation printing showing which target handled each request
- Weighted distribution with count summary
- Drop-in usage with PromptSendingAttack
- Multi-turn attack (Crescendo) with round-robin objective target
- Batch scoring with round-robin scorer target, printing which scorer target scored each prompt

Next Step

Enable RoundRobinTarget selection in the GUI frontend

hannahwestra25 · 2026-05-20T16:39:41Z

copilot noted that, "OpenAI prefix caching can give 50%+ cost reduction on long conversations. Switching targets every turn means every target pays full price for the entire conversation prefix on every turn. For a Crescendo attack with 5+ turns across thousands of objectives, you could be doubling your API cost compared to pinning conversations to targets." so at the very least its something to document but also potentially want to give users the option to pin a conversation to a given target so if you have multiple conversations and multiple targets you assign the conversation to the target OR you do truly round robin like this PR sets up

jsong468 · 2026-05-20T18:02:16Z

copilot noted that, "OpenAI prefix caching can give 50%+ cost reduction on long conversations. Switching targets every turn means every target pays full price for the entire conversation prefix on every turn. For a Crescendo attack with 5+ turns across thousands of objectives, you could be doubling your API cost compared to pinning conversations to targets." so at the very least its something to document but also potentially want to give users the option to pin a conversation to a given target so if you have multiple conversations and multiple targets you assign the conversation to the target OR you do truly round robin like this PR sets up

Good point! I can definitely be more elaborate in the documentation, but I think 1) this is something that should ultimately be up to the user and what they want to trade off (higher cost vs. hitting rate limits). 2) Giving users the option to pin a conversation to a specific target couples conversation state management and target prompt sending functionality (simply receiving a normalized conversation and adding a message) and was something we tried to avoid here by requiring editable history requirement.

If a user just wanted to run an elaborate attack against one target (one conversation, one target) they could just use that target directly instead. Configuring a different endpoint per attack in something like a scenario seems like something we could do later on at the scenario or AttackExecutor level, not at the target level. (It also wouldn't be difficult for a user currently to set up a loop that executes attacks alternating between targets. We show a somewhat similar examples in our notebooks for looping through objectives for attacks, and a user could just loop through targets as well on each new iteration.)

hannahwestra25

just a few small remaining nits!

round robin target v1

eb32e47