chore(auth): Address remaining Regional Access Boundary feedback#12867
chore(auth): Address remaining Regional Access Boundary feedback#12867vverman wants to merge 10 commits intogoogleapis:regional-access-boundariesfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the RegionalAccessBoundary and RegionalAccessBoundaryManager classes to improve resource management and testability. Key changes include replacing environment-variable-based feature toggling with a ThreadLocal mechanism for tests, migrating from manual thread creation to a bounded ExecutorService for asynchronous refresh tasks to prevent resource exhaustion, and ensuring HTTP responses are properly disconnected in a finally block. Additionally, the test suite has been migrated to JUnit 5, and several tests were cleaned up by removing deprecated environment provider mocks. I have no feedback to provide.
| // Unbounded thread creation is discouraged in library code to avoid resource | ||
| // exhaustion. A shared, bounded executor service ensures a hard limit (5) | ||
| // on concurrent refresh tasks, while threadCount provides unique names | ||
| // for easier debugging. | ||
| private static final AtomicInteger threadCount = new AtomicInteger(0); | ||
| private static final ExecutorService EXECUTOR = | ||
| Executors.newFixedThreadPool( | ||
| 5, | ||
| r -> { | ||
| Thread t = new Thread(r, "RAB-refresh-" + threadCount.getAndIncrement()); | ||
| t.setDaemon(true); | ||
| return t; | ||
| }); |
There was a problem hiding this comment.
The tradeoff we are making is between:
- Executor - Possible memory leak where the executor and the threads cannot be released as the auth library does not have a lifecycle.
- Thread - Possible unbounded threads created (note: we aim to limit the amount of threads spawning per credential, but it may be possible to spawn an unbounded number of credentials).
I don't think there will be perfect solution. For a middle ground, is it possible to add allowCoreThreadTimeOut() and set a keep-alive value of something like ~1ish hour(s)? Single credentials would end up just invoking new Thread() in the pool as the RAB call should be idle for the other ~5 hours. If there are multiple credentials that need RAB calls, the executor should be able to use the existing threads as they'll no longer be marked as idle. WDYT?
There was a problem hiding this comment.
I think this is a good idea to manage memory usage, especially in situations where the application is redeployed.
In the present implementation, since we don't shutdown the threads within the pool, they would not be garbage collected and the reference would live on while the server doesn't use the old application.
With this thread shutdown suggestion, we avoid that pitfall.
I agree that the threads which are idle for an hour can be timed-out and removed from the pool.
|
@vverman The sdk-platform-java-ci CI issues are not relevant for this PR (existing issues as part of the monorepo migration). PTAL at the conventialcommit CI complaints: https://github.com/googleapis/google-cloud-java/pull/12867/checks?check_run_id=72739487063 |
|
Conventional commits addressed. |
lqiu96
left a comment
There was a problem hiding this comment.
Changes LGTM overall. Added a few comments if you could take a look. I'm not entirely sure what the DISABLE_RAB_TESTS_ENV is for
|
Added per-test disablement of RAB refresh and bounded queue. |
| transportFactory.transport.setServiceAccountEmail("SA_CLIENT_EMAIL"); | ||
| ComputeEngineCredentials credentials = | ||
| ComputeEngineCredentials.newBuilder().setHttpTransportFactory(transportFactory).build(); | ||
| credentials.regionalAccessBoundaryManager.setCachedRAB( |
There was a problem hiding this comment.
This seeds a fake cached RAB, but that isn't the same as disabling RAB.
We should replace the fake cached RAB setup with a real test-only disable path.
For non-RAB tests, we want two things:
- no background RAB refresh
- no
x-allowed-locationsheader added to request metadata
Adding fake cached RAB only solves the first part, and changes the headers/logs under test.
There was a problem hiding this comment.
For non-RAB tests we want ... no x-allowed-locations header added to request metadata
For those test that do check the presence of the headers, they check for the presence of the authorization header. Which is rightly present, thereby not affecting the existing testing logic.
It is not that the x-allowed-locations header is incorrect, that we're 'disabling RAB' rather it is because async refreshes might non-deterministically consume mocks. That problem is avoided here.
Additionally, this method doesn't add any production logic exclusively for testing.
There was a problem hiding this comment.
The downside with your approach is you need to make sure to update all the assertions. There are now 5 log events, not 3, so the test is failing. Please make sure the necessary places are updated.
There was a problem hiding this comment.
Thank you for the suggestion.
I checked the failure in LoggingTest and found the test which was failing "getRequestMetadata_hasAccessToken" was caused by the lazy loading of the universe domain, triggered by the new RAB check, rather than the RAB refresh itself."
For ComputeEngineCredentials, the first call to getUniverseDomain() fetches it from the metadata server, generating 2 extra logs. Since the RAB check now calls this lazily during getRequestMetadata(), these logs were captured in the test.
So I bumped the count for the test to 5. The rest of the tests weren't updated because they don't trigger a lazy load of the universe domain and aren't at risk of triggering the increased logs.
| /** | ||
| * Checks if the regional access boundary feature is enabled. The feature is enabled if the | ||
| * environment variable or system property "GOOGLE_AUTH_TRUST_BOUNDARY_ENABLE_EXPERIMENT" is set | ||
| * to "true" or "1" (case-insensitive). | ||
| * Checks if the regional access boundary feature is enabled. | ||
| * | ||
| * <p>This method is for internal use only and may be changed or removed in future releases. | ||
| * | ||
| * @return True if the regional access boundary feature is enabled, false otherwise. | ||
| */ |
There was a problem hiding this comment.
Since we deleted the isEnabled() method, can we remove the associated method?
|
|
||
| // Note: this is for internal testing use use only. | ||
| // TODO: Fix unit test mocks so this can be removed | ||
| // Refer -> https://github.com/googleapis/google-auth-library-java/issues/1898 |
There was a problem hiding this comment.
can we close this ticket created here? Or we want to wait until this is merged into main?
| LOGGER_PROVIDER, | ||
| java.util.logging.Level.WARNING, | ||
| null, | ||
| "Regional Access Boundary background refresh failed to schedule: " + e.getMessage()); |
There was a problem hiding this comment.
I forgot to ask about this earlier. Do we plan to have any public docs regarding this to explain this behavior? I wonder if something like this may be confusing to the user and they may be asking what RAB is.
If not, can we use something that either has a bit softer wording or adding something that assures this is nothing to worry about
| * This duplicates tests setups, but centralizes logging test setup in this class. | ||
| */ | ||
| class LoggingTest { | ||
| public class LoggingTest { |
There was a problem hiding this comment.
nit: is this needed? Or just slipped in
| awsCredential.regionalAccessBoundaryManager.setCachedRAB( | ||
| new RegionalAccessBoundary( | ||
| "dummy-locations", Arrays.asList("dummy-loc"), awsCredential.clock)); |
There was a problem hiding this comment.
nit: changes make sense. thanks I like this more than the env var idea.
Is possible just to create one constant RegionalAccessBoundary object per file?
| new Exception("Regional Access Boundary background refresh failed to schedule", e)); | ||
| LoggingUtils.log( | ||
| LOGGER_PROVIDER, | ||
| java.util.logging.Level.WARNING, |
There was a problem hiding this comment.
nit: can you use the short name and import instead?
lqiu96
left a comment
There was a problem hiding this comment.
added a few small nits. Changes look fine on my end. PTAL at the nits and see if the make sense/ should be addressed. Feel free to merge afterwards
The RAB refresh uses a direct executor with a fixed thread pool as opposed to instantiating a new thread each time.
The RAB env gate -> GOOGLE_AUTH_TRUST_BOUNDARY_ENABLE_EXPERIMENT has been removed. This means RAB refresh triggers by default.
Added other fixes/suggestions made in the previous Java PR.