MIWI e2e - role assignment scope over resources required, rather than entire resource group#4751
MIWI e2e - role assignment scope over resources required, rather than entire resource group#4751cadenmarchese wants to merge 2 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates MIWI e2e cluster provisioning to assign workload identity roles at minimal, resource-level scopes (matching the customer-documented flow) instead of assigning roles at the entire resource group scope.
Changes:
- Refactors
SetupWorkloadIdentityto compute and apply role assignment scopes per role, rather than assigning at the resource group level. - Adds federated credential role assignments from the
aro-Clusteridentity to operator managed identities (and optionally to a mock MSI). - Threads
diskEncryptionSetIDinto workload identity setup and standardizes thearo-Clusteroperator name via a constant.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Get the role definition to check its permissions | ||
| subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID) | ||
| filter := fmt.Sprintf("roleName eq '%s'", roleGUID) | ||
| roleDefs, err := c.roledefinitions.List(ctx, subscriptionScope, filter) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to list role definition %s: %w", roleDefinitionID, err) | ||
| } | ||
|
|
||
| if len(roleDefs) == 0 { | ||
| return nil, fmt.Errorf("role definition %s not found", roleDefinitionID) | ||
| } | ||
|
|
||
| roleDef := roleDefs[0] | ||
|
|
There was a problem hiding this comment.
The RoleDefinitions list filter is using roleName eq '<guid>', but roleName is the human-readable role name (e.g. "Reader"), not the role definition GUID from the ID. With the current code this will typically return an empty list and cause WI setup to fail with "role definition ... not found". Consider retrieving the role definition by ID (RoleDefinitionsClient.Get) or listing without this filter and matching by the role definition name/ID suffix.
| // Get the role definition to check its permissions | |
| subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID) | |
| filter := fmt.Sprintf("roleName eq '%s'", roleGUID) | |
| roleDefs, err := c.roledefinitions.List(ctx, subscriptionScope, filter) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to list role definition %s: %w", roleDefinitionID, err) | |
| } | |
| if len(roleDefs) == 0 { | |
| return nil, fmt.Errorf("role definition %s not found", roleDefinitionID) | |
| } | |
| roleDef := roleDefs[0] | |
| // Get the role definition to check its permissions. The roleDefinitionID contains | |
| // the role definition resource ID / GUID, while the RoleDefinitions list filter's | |
| // roleName field expects the human-readable role name (for example, "Reader"). | |
| // List without that filter and match the returned role definition by ID instead. | |
| subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID) | |
| roleDefs, err := c.roledefinitions.List(ctx, subscriptionScope, "") | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to list role definition %s: %w", roleDefinitionID, err) | |
| } | |
| var roleDef *mgmtauthorization.RoleDefinition | |
| for i := range roleDefs { | |
| if roleDefs[i].ID == nil { | |
| continue | |
| } | |
| id := *roleDefs[i].ID | |
| if strings.EqualFold(id, roleDefinitionID) || strings.HasSuffix(strings.ToLower(id), "/"+strings.ToLower(roleGUID)) { | |
| roleDef = &roleDefs[i] | |
| break | |
| } | |
| } | |
| if roleDef == nil { | |
| return nil, fmt.Errorf("role definition %s not found", roleDefinitionID) | |
| } |
| scopes, err := c.determineRequiredPlatformWorkloadIdentityScopes(ctx, wi.RoleDefinitionID, vnetResourceGroup, diskEncryptionSetID) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
If determineRequiredPlatformWorkloadIdentityScopes returns an empty slice, the subsequent loop will skip role assignment entirely (no error), leaving the managed identity without required permissions and likely causing later auth failures. It would be safer to either return an error when no scopes can be derived, or fall back to a known-safe scope set (e.g. the documented minimal resource scopes).
| } | |
| } | |
| if len(scopes) == 0 { | |
| return fmt.Errorf("no platform workload identity scopes could be determined for operator %s and role %s", wi.OperatorName, wi.RoleDefinitionID) | |
| } |
| if strings.Contains(action, "Microsoft.Network/virtualNetworks/subnets/") { | ||
| // Assign to both master and worker subnets | ||
| masterSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-master", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName) | ||
| workerSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-worker", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName) | ||
| scopes = append(scopes, masterSubnet, workerSubnet) |
There was a problem hiding this comment.
The scope-building logic can append duplicates (e.g. if multiple permission blocks include subnet-related actions), leading to redundant role assignment calls. Consider de-duplicating scopes before returning/assigning to reduce API calls and avoid relying on conflict handling for idempotency.
| // Determine required scopes based on role permissions | ||
| scopes, err := c.determineRequiredPlatformWorkloadIdentityScopes(ctx, wi.RoleDefinitionID, vnetResourceGroup, diskEncryptionSetID) | ||
| if err != nil { |
There was a problem hiding this comment.
SetupWorkloadIdentity and determineRequiredPlatformWorkloadIdentityScopes introduce new logic for computing scopes and performing per-scope role assignments, but there are no unit tests covering scope determination and the resulting role assignment behavior. Consider adding tests (e.g., mocking RoleDefinitionsClient and asserting the expected scopes for representative roles) to prevent regressions as role definitions evolve.
SudoBrendan
left a comment
There was a problem hiding this comment.
Fix E2E - a few other cleanups if you'd like. Thanks Caden!
|
|
||
| // Get the role definition to check its permissions | ||
| subscriptionScope := fmt.Sprintf("/subscriptions/%s", c.Config.SubscriptionID) | ||
| filter := fmt.Sprintf("roleName eq '%s'", roleGUID) |
There was a problem hiding this comment.
bug (E2E caught it too): need to use the Display Name rather than GUID here.
Alternatively, use Get or GetByID
| // Assign federated credential role from aro-Cluster identity to each operator identity | ||
| _, ok := operatorIdentities[aroClusterIdentityOperatorName] | ||
| if !ok { | ||
| return fmt.Errorf("%s identity not found", aroClusterIdentityOperatorName) | ||
| } | ||
|
|
||
| // Get the aro-Cluster identity to get its principalID | ||
| aroClusterIdentity, err := c.msiClient.Get(ctx, vnetResourceGroup, aroClusterIdentityOperatorName, nil) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get %s identity: %w", aroClusterIdentityOperatorName, err) | ||
| } |
There was a problem hiding this comment.
idea: instead of making a call to CREATE this identity above, then GET it here - can we preserve the data on CREATE? We should have it above as resp.Properties.PrincipalID, right?
| }, | ||
| ) | ||
|
|
||
| operatorIdentities[wi.OperatorName] = *resp.ID |
There was a problem hiding this comment.
resp is a struct, but its ID/Properties/Properties.PrincipalID are all pointers. Do we need to bother with a nil check on these?
| // Assign role to each determined scope | ||
| for _, scope := range scopes { | ||
| c.log.Infof("assigning role %s to scope %s for principal %s", wi.RoleDefinitionID, scope, *resp.Properties.PrincipalID) | ||
| for i := 0; i < 5; i++ { |
There was a problem hiding this comment.
I see this loop duplicated a few times - can we make it a function?
| for _, action := range *perm.Actions { | ||
| // Check for subnet-specific permissions | ||
| if strings.Contains(action, "Microsoft.Network/virtualNetworks/subnets/") { | ||
| // Assign to both master and worker subnets | ||
| masterSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-master", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName) | ||
| workerSubnet := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet/subnets/%s-worker", c.Config.SubscriptionID, vnetResourceGroup, c.Config.ClusterName) | ||
| scopes = append(scopes, masterSubnet, workerSubnet) | ||
| break | ||
| } | ||
|
|
||
| // Check for VNet-level permissions | ||
| if strings.Contains(action, "Microsoft.Network/virtualNetworks/") && !strings.Contains(action, "Microsoft.Network/virtualNetworks/subnets/") { | ||
| vnetID := fmt.Sprintf("/subscriptions/%s/resourceGroups/%s/providers/Microsoft.Network/virtualNetworks/dev-vnet", c.Config.SubscriptionID, vnetResourceGroup) | ||
| scopes = append(scopes, vnetID) | ||
| break | ||
| } | ||
|
|
||
| // Check for DES permissions | ||
| if strings.Contains(action, "Microsoft.Compute/diskEncryptionSets") { | ||
| scopes = append(scopes, diskEncryptionSetID) | ||
| break | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return scopes, nil |
There was a problem hiding this comment.
What happens if none of the cases in the for _, action loop land, and we return [], nil here? Is that possible? Should we log it or return an error instead?
Which issue this PR addresses:
Fixes No JIRA
What this PR does / why we need it:
Test plan for issue:
Is there any documentation that needs to be updated for this PR?
How do you know this will function as expected in production?