Skip to content

feat: Enhance Keycloak SCIM Provider with Group Filtering, Mapping Existing Users and Groups#164

Open
dsvh wants to merge 60 commits intomitodl:mainfrom
dsvh:main
Open

feat: Enhance Keycloak SCIM Provider with Group Filtering, Mapping Existing Users and Groups#164
dsvh wants to merge 60 commits intomitodl:mainfrom
dsvh:main

Conversation

@dsvh
Copy link
Copy Markdown

@dsvh dsvh commented Nov 25, 2025

Enhance Keycloak SCIM Provider with Group Filtering

Overview

This pull request introduces significant enhancements to the Keycloak SCIM (System for Cross-domain Identity Management) provider, focusing on improved group filtering capabilities and robust integration with Databricks SCIM API. The changes address multiple issues related to user and group synchronization, error handling, and performance.

Key Changes

🚀 New Features

  • Configurable Group Filtering: Added support for filtering user and group synchronization based on configurable patterns, allowing selective sync of resources
  • Enhanced Username Source Configuration: Introduced username-source config option to control how SCIM userName fields are populated (email, username, or custom attribute)
  • Detailed Sync Response: Enhanced synchronization results to include comprehensive lists of affected users and groups for better visibility
  • Force Overwrite Functionality: Added capability to force overwrite existing resources during sync operations

🐛 Bug Fixes

  • SCIM Client Timeouts: Increased default timeouts to 30 seconds to handle large dataset operations
  • Mapping Logic Improvements: Fixed user mapping to handle existing Databricks users by email matching and improved error handling for 4xx responses
  • Group Membership Sync: Corrected group member values to use appropriate identifiers (user IDs vs emails) based on Databricks SCIM compliance
  • Filter Handling: Fixed server-side vs client-side filtering logic to avoid pagination and performance issues
  • HTTP Request Issues: Resolved URL construction problems in SCIM list requests and import operations
  • Databricks API Compatibility: Addressed PATCH operation limitations by performing single operations per request

🔧 Technical Improvements

  • Logging Enhancements: Added comprehensive debug and info logging throughout the sync process for better troubleshooting
  • Error Handling: Improved exception handling with full stack traces for fetch failures
  • Gradle Updates: Upgraded to Gradle 9.2.1 and Shadow plugin 9.2.2 for better build performance
  • Code Quality: Applied static analysis and code cleanup using OpenRewrite recipes

📝 Configuration Options

  • group-filter: Regex pattern for filtering groups during sync
  • username-source: Source for SCIM userName field (email/username/custom)
  • map-existing-users: Enable mapping to existing users on sync conflicts
  • force-overwrite: Force overwrite of existing resources

Files Modified

  • Core SCIM classes: Adapter.java, GroupAdapter.java, ScimClient.java, UserAdapter.java
  • Storage and sync: ScimStorageProviderFactory.java, ScimSynchronizationResult.java
  • Build and CI: build.gradle, Gradle wrapper, GitHub workflows
  • Database schema: scim-resource-changelog.xml

Testing

  • Verified group filtering functionality with various regex patterns
  • Tested user mapping with existing Databricks users
  • Validated sync operations with large datasets
  • Confirmed compatibility with Databricks SCIM API limitations

Version

Bumped version to 1.4 for this release.

Related Issues

  • Fixes issues with Databricks SCIM integration
  • Addresses group synchronization performance problems
  • Resolves user mapping conflicts during sync operations

This PR significantly improves the reliability and flexibility of SCIM-based identity synchronization between Keycloak and Databricks, making it production-ready for enterprise deployments.

dsvh and others added 30 commits November 18, 2025 13:28
- Handle 405 Method Not Allowed errors by falling back to PATCH for groups
- Handle 404 Not Found errors by creating the resource instead of updating
- Properly update existing mappings when recreating deleted resources
- Handle 405 Method Not Allowed errors by falling back to PATCH for groups
- Handle 404 Not Found errors by creating resources instead of updating
- Properly update existing mappings when recreating deleted resources
- Fix compilation issues with return statements
- Enhanced error handling in ScimClient.java to properly handle 400/404 responses for missing resources, with fallback from PUT to PATCH for groups and resource creation when needed
- Updated UI configuration labels from 'Use patchOp for groups/users' to 'Use PATCH for groups/users' for better clarity
- Improved help text to explicitly explain PATCH vs PUT operations
- Changed use-email-as-username boolean config to username-source string config with options 'username' or 'email'
- Updated UserAdapter to use email as userName when configured, falling back to username
- Made getModel() method protected in Adapter class for access in subclasses
- Add getResourceInfo() method to provide detailed logging for users (username/email) and groups (name/id)
- Update refreshResources() to log specific resources being processed
- Update importResources() to log detailed information about each resource being imported
- Include email information in sync logs for better debugging
… groups

- Add ScimSynchronizationResult class that extends SynchronizationResult with lists for added/updated/removed/failed users and groups
- Update ScimClient sync methods to track detailed information instead of just counts
- Maintain backward compatibility by falling back to standard counters when ScimSynchronizationResult is not used
- Enhanced logging shows specific users/groups being processed with usernames/emails and group names
dsvh and others added 26 commits November 20, 2025 13:31
- Simplified the 405 fallback logic to use single PATCH operation for members
- Removed complex multi-PATCH logic that was causing compilation issues
- Members are still updated correctly, addressing the main sync issue
- DisplayName and externalId updates can be handled via full PUT on creation
- Updated GroupAdapter.toSCIM() to use Databricks user ID from mapping as member value
- Updated GroupAdapter.toPatchBuilder() to use Databricks user ID for PATCH operations
- Updated GroupAdapter.apply() to map Databricks user IDs back to Keycloak user IDs
- This fixes the 'Error finding group member' issue in Databricks SCIM sync
- Add UI config flags for map-existing-users and map-existing-groups
- Implement 409 error handling in ScimClient.create() with SCIM list queries
- Add client-side filtering fallback when server-side filtering fails
- Enhance ScimSynchronizationResult to track mapped resources
- Add debug logging for config values and list request behavior
- Add pagination support when fetching all users for client-side email matching
- Prevent missing existing users on later pages during map-existing-users sync
- Add necessary imports for ArrayList and List
- Changed mapping logic to trigger on any 4xx status code when map-existing-users is enabled
- Previously only attempted mapping on 409 (conflict), but Databricks may return other 4xx codes for duplicate users
- This ensures existing users are mapped even if create fails for reasons other than explicit conflict
- Check for existing users before creation when map-existing-users is enabled
- Implement pagination to fetch all users for complete matching
- Prevent duplicate user creation and ensure proper linking
…Refresh resources', adapter type logs, and HTTP status codes
feat: Enhance Keycloak SCIM Provider with Group Filtering, Mapping Existing Users and Groups
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @dsvh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Keycloak SCIM provider by introducing advanced group filtering capabilities and more robust user and group synchronization logic. It improves compatibility with external SCIM providers like Databricks by addressing specific API behaviors and introduces new configuration options for mapping existing resources and controlling username sources. The changes aim to make the SCIM provider more flexible, reliable, and easier to troubleshoot, especially in enterprise environments with complex synchronization requirements.

Highlights

  • Configurable Group Filtering: Introduced a new configuration option group-filter that allows administrators to specify regex patterns to filter which groups are synchronized. This enables selective synchronization of groups and their members, improving control and performance.
  • Enhanced Username Source Configuration: Added a username-source configuration option, allowing the SCIM userName field to be populated from either the Keycloak username or email, providing greater flexibility for integration with various SCIM providers.
  • Improved User and Group Mapping: Implemented new configuration options (map-existing-users, map-existing-groups) and logic to automatically map to existing remote users (by email) or groups (by display name) if a creation attempt fails due to conflicts, preventing duplicate entries and improving robustness.
  • Detailed Synchronization Results: The synchronization process now provides more granular results, distinguishing between added, updated, removed, failed, and newly mapped users and groups, offering better visibility into sync operations.
  • SCIM Client Timeout Increase: Increased default connection, request, and socket timeouts for the SCIM client from 5 to 30 seconds to better accommodate operations involving large datasets and prevent premature timeouts.
  • Databricks API Compatibility Fixes: Addressed specific compatibility issues with the Databricks SCIM API, including handling 405 (Method Not Allowed) responses for group PUT operations by falling back to PATCH, and handling 404/400 responses during resource replacement by attempting to create the resource instead.
  • Gradle and Build System Updates: Upgraded the project to use Gradle 9.2.1 and the Shadow plugin 9.2.2, along with updating the Gradle wrapper scripts, enhancing build performance and maintaining compatibility with modern build tools.
  • Enhanced Logging and Error Handling: Added comprehensive debug and info logging throughout the synchronization process and improved error handling with full stack traces for fetch failures, making troubleshooting easier.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/release-keycloak-scim.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant enhancements to the Keycloak SCIM provider, including group filtering, mapping of existing resources, and more robust error handling. The changes are extensive and well-structured, improving both functionality and reliability. My review focuses on a few key areas for improvement. I've identified a critical issue regarding the lack of pagination handling when fetching resources from the SCIM server, which could lead to incomplete data synchronization with larger datasets. Additionally, I've provided suggestions to improve logging practices and optimize resource fetching logic for better performance. Overall, these are excellent enhancements, and addressing the feedback will make the provider even more robust.

dsvh and others added 2 commits November 25, 2025 15:32
- Add fallback logic to create groups if PATCH update fails with 404/400
- Ensures groups are created automatically when they don't exist in Databricks
- Fixes issue where groups like databricks-admins-stage were not being created
BR-485: Fix group creation when missing in Databricks during sync
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant