Skip to content

MIWI e2e - role assignment scope over resources required, rather than entire resource group#4751

Draft
cadenmarchese wants to merge 2 commits intomasterfrom
cadenmarchese/e2e-should-test-minimal-role-scope
Draft

MIWI e2e - role assignment scope over resources required, rather than entire resource group#4751
cadenmarchese wants to merge 2 commits intomasterfrom
cadenmarchese/e2e-should-test-minimal-role-scope

Conversation

@cadenmarchese
Copy link
Copy Markdown
Member

Which issue this PR addresses:

Fixes No JIRA

What this PR does / why we need it:

  • In the customer documentation, we recommend that MIWI clusters are created with role assignments over specific resources needed rather than entire resource group to maintain minimal permissions scope.
  • MIWI e2e currently performs the role assignments over the entire resource group, which means we are not e2e testing this minimal scope.
  • This code change brings the e2e code in alignment with what's actually documented so that we're testing the same role assignment flow that the customer is using.

Test plan for issue:

  • MIWI e2e should pass.

Is there any documentation that needs to be updated for this PR?

How do you know this will function as expected in production?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates MIWI e2e cluster provisioning to assign workload identity roles at minimal, resource-level scopes (matching the customer-documented flow) instead of assigning roles at the entire resource group scope.

Changes:

  • Refactors SetupWorkloadIdentity to compute and apply role assignment scopes per role, rather than assigning at the resource group level.
  • Adds federated credential role assignments from the aro-Cluster identity to operator managed identities (and optionally to a mock MSI).
  • Threads diskEncryptionSetID into workload identity setup and standardizes the aro-Cluster operator name via a constant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +592 to +605
// Get the role definition to check its permissions
subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID)
filter := fmt.Sprintf("roleName eq '%s'", roleGUID)
roleDefs, err := c.roledefinitions.List(ctx, subscriptionScope, filter)
if err != nil {
return nil, fmt.Errorf("failed to list role definition %s: %w", roleDefinitionID, err)
}

if len(roleDefs) == 0 {
return nil, fmt.Errorf("role definition %s not found", roleDefinitionID)
}

roleDef := roleDefs[0]

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RoleDefinitions list filter is using roleName eq '<guid>', but roleName is the human-readable role name (e.g. "Reader"), not the role definition GUID from the ID. With the current code this will typically return an empty list and cause WI setup to fail with "role definition ... not found". Consider retrieving the role definition by ID (RoleDefinitionsClient.Get) or listing without this filter and matching by the role definition name/ID suffix.

Suggested change
// Get the role definition to check its permissions
subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID)
filter := fmt.Sprintf("roleName eq '%s'", roleGUID)
roleDefs, err := c.roledefinitions.List(ctx, subscriptionScope, filter)
if err != nil {
return nil, fmt.Errorf("failed to list role definition %s: %w", roleDefinitionID, err)
}
if len(roleDefs) == 0 {
return nil, fmt.Errorf("role definition %s not found", roleDefinitionID)
}
roleDef := roleDefs[0]
// Get the role definition to check its permissions. The roleDefinitionID contains
// the role definition resource ID / GUID, while the RoleDefinitions list filter's
// roleName field expects the human-readable role name (for example, "Reader").
// List without that filter and match the returned role definition by ID instead.
subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID)
roleDefs, err := c.roledefinitions.List(ctx, subscriptionScope, "")
if err != nil {
return nil, fmt.Errorf("failed to list role definition %s: %w", roleDefinitionID, err)
}
var roleDef *mgmtauthorization.RoleDefinition
for i := range roleDefs {
if roleDefs[i].ID == nil {
continue
}
id := *roleDefs[i].ID
if strings.EqualFold(id, roleDefinitionID) || strings.HasSuffix(strings.ToLower(id), "/"+strings.ToLower(roleGUID)) {
roleDef = &roleDefs[i]
break
}
}
if roleDef == nil {
return nil, fmt.Errorf("role definition %s not found", roleDefinitionID)
}

Copilot uses AI. Check for mistakes.
scopes, err := c.determineRequiredPlatformWorkloadIdentityScopes(ctx, wi.RoleDefinitionID, vnetResourceGroup, diskEncryptionSetID)
if err != nil {
return err
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If determineRequiredPlatformWorkloadIdentityScopes returns an empty slice, the subsequent loop will skip role assignment entirely (no error), leaving the managed identity without required permissions and likely causing later auth failures. It would be safer to either return an error when no scopes can be derived, or fall back to a known-safe scope set (e.g. the documented minimal resource scopes).

Suggested change
}
}
if len(scopes) == 0 {
return fmt.Errorf("no platform workload identity scopes could be determined for operator %s and role %s", wi.OperatorName, wi.RoleDefinitionID)
}

Copilot uses AI. Check for mistakes.
Comment on lines +617 to +621
if strings.Contains(action, "Microsoft.Network/virtualNetworks/subnets/") {
// Assign to both master and worker subnets
masterSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-master", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName)
workerSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-worker", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName)
scopes = append(scopes, masterSubnet, workerSubnet)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope-building logic can append duplicates (e.g. if multiple permission blocks include subnet-related actions), leading to redundant role assignment calls. Consider de-duplicating scopes before returning/assigning to reduce API calls and avoid relying on conflict handling for idempotency.

Copilot uses AI. Check for mistakes.
Comment on lines +431 to 433
// Determine required scopes based on role permissions
scopes, err := c.determineRequiredPlatformWorkloadIdentityScopes(ctx, wi.RoleDefinitionID, vnetResourceGroup, diskEncryptionSetID)
if err != nil {
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetupWorkloadIdentity and determineRequiredPlatformWorkloadIdentityScopes introduce new logic for computing scopes and performing per-scope role assignments, but there are no unit tests covering scope determination and the resulting role assignment behavior. Consider adding tests (e.g., mocking RoleDefinitionsClient and asserting the expected scopes for representative roles) to prevent regressions as role definitions evolve.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@SudoBrendan SudoBrendan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix E2E - a few other cleanups if you'd like. Thanks Caden!


// Get the role definition to check its permissions
subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID)
filter := fmt.Sprintf("roleName eq '%s'", roleGUID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug (E2E caught it too): need to use the Display Name rather than GUID here.

Alternatively, use Get or GetByID

Comment on lines +482 to +492
// Assign federated credential role from aro-Cluster identity to each operator identity
_, ok := operatorIdentities[aroClusterIdentityOperatorName]
if !ok {
return fmt.Errorf("%s identity not found", aroClusterIdentityOperatorName)
}

// Get the aro-Cluster identity to get its principalID
aroClusterIdentity, err := c.msiClient.Get(ctx, vnetResourceGroup, aroClusterIdentityOperatorName, nil)
if err != nil {
return fmt.Errorf("failed to get %s identity: %w", aroClusterIdentityOperatorName, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: instead of making a call to CREATE this identity above, then GET it here - can we preserve the data on CREATE? We should have it above as resp.Properties.PrincipalID, right?

},
)

operatorIdentities[wi.OperatorName] = *resp.ID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resp is a struct, but its ID/Properties/Properties.PrincipalID are all pointers. Do we need to bother with a nil check on these?

// Assign role to each determined scope
for _, scope := range scopes {
c.log.Infof("assigning role %s to scope %s for principal %s", wi.RoleDefinitionID, scope, *resp.Properties.PrincipalID)
for i := 0; i < 5; i++ {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this loop duplicated a few times - can we make it a function?

Comment on lines +615 to +641
for _, action := range *perm.Actions {
// Check for subnet-specific permissions
if strings.Contains(action, "Microsoft.Network/virtualNetworks/subnets/") {
// Assign to both master and worker subnets
masterSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-master", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName)
workerSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-worker", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName)
scopes = append(scopes, masterSubnet, workerSubnet)
break
}

// Check for VNet-level permissions
if strings.Contains(action, "Microsoft.Network/virtualNetworks/") && !strings.Contains(action, "Microsoft.Network/virtualNetworks/subnets/") {
vnetID := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet", c.Config.SubscriptionID, vnetResourceGroup)
scopes = append(scopes, vnetID)
break
}

// Check for DES permissions
if strings.Contains(action, "Microsoft.Compute/diskEncryptionSets") {
scopes = append(scopes, diskEncryptionSetID)
break
}
}
}
}

return scopes, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if none of the cases in the for _, action loop land, and we return [], nil here? Is that possible? Should we log it or return an error instead?

@cadenmarchese cadenmarchese marked this pull request as draft April 7, 2026 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chainsaw Pull requests or issues owned by Team Chainsaw

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants