feat(workers): implement Reddit OAuth client credentials flow to bypa… by kunal-rathore-111 · Pull Request #2887 · karakeep-app/karakeep

kunal-rathore-111 · 2026-06-14T14:58:10Z

Description

Reddit has recently updated their crawler policies, causing our unauthenticated .json requests to frequently get blocked.

To resolve this, this PR implements the official Reddit OAuth Client Credentials flow for the metascraper-reddit plugin.

Specifically:

Added optional REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRET environment variables to serverConfig.
The scraper now fetches an access token and uses an in-memory cache (with a 5-minute safety buffer before expiration) to avoid rate limits on the authentication endpoint.
If the credentials are provided, requests are routed through oauth.reddit.com using the Bearer token and a custom User-Agent.
If credentials are not configured, it gracefully falls back to the previous unauthenticated .json polling, ensuring the change is fully backward-compatible.

How Has This Been Tested?

Verified that if REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRET are not present, it correctly falls back to unauthenticated .json requests.
Verified that when credentials are provided, the system correctly fetches a token, routes the request to oauth.reddit.com, and successfully fetches the Reddit metadata without being blocked.
Verified that the token cache correctly caches the token and reuses it until expiration.

Screenshots (if appropriate)

Checklist:

I have carefully read CONTRIBUTING.md
I have performed a self-review of my own code
I have made corresponding changes to the documentation if applicable
I have no unrelated changes in the PR.
I have confirmed that any new dependencies are strictly necessary.
I have written tests for new code (if applicable)

Please describe to which degree, if any, an LLM was used in creating this pull request.

I collaborated with an AI coding assistant to help design the caching logic and integrate the standard OAuth Client Credentials flow.

…ss crawler block

greptile-apps · 2026-06-14T15:00:12Z

Greptile Summary

This PR adds Reddit OAuth client-credentials support to the metascraper-reddit plugin to bypass the recent API blocking of unauthenticated .json requests, with a graceful fallback when credentials are absent.

Introduces getRedditAccessToken with an in-memory token cache; fetches from oauth.reddit.com when credentials are present, otherwise falls back to the existing unauthenticated path.
Adds optional REDDIT_CLIENT_ID / REDDIT_CLIENT_SECRET env vars to serverConfig following existing patterns.

Confidence Score: 3/5

The fallback path is safe, but the OAuth token cache has two bugs that need fixing before enabling credentials in production.

The buffer subtraction in redditAccessTokenExpiresAt can produce a negative offset if Reddit ever returns a short-lived token, permanently bypassing the cache and hammering the auth endpoint. Separately, the token-refresh function has no in-flight deduplication, so concurrent scrape workers will each issue their own refresh request when the token expires — the URL-level cache already solves this correctly with a stored Promise, but that pattern wasn't applied to the token refresh. Both issues are in the OAuth path only; the unauthenticated fallback is unaffected.

apps/workers/metascraper-plugins/metascraper-reddit.ts — specifically the token caching logic around lines 109–152

Important Files Changed

Filename	Overview
apps/workers/metascraper-plugins/metascraper-reddit.ts	Adds OAuth client-credentials token caching and oauth.reddit.com routing; has an underflow bug in the buffer calculation and a concurrent-refresh race condition.
packages/shared/config.ts	Adds optional REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRET env vars following existing patterns; no issues.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/workers/metascraper-plugins/metascraper-reddit.ts:144-146
If `expires_in` is less than 300 (a short-lived token or unexpected server response), `(data.expires_in - 300)` is negative, making `redditAccessTokenExpiresAt` a timestamp in the past. Every subsequent call would skip the cache and issue a new token request, likely triggering Reddit's rate limit on the auth endpoint.

```suggestion
    redditAccessToken = data.access_token;
    // Expire 5 minutes before the actual expiration to be safe
    redditAccessTokenExpiresAt = now + Math.max(0, data.expires_in - 300) * 1000;
```

### Issue 2 of 2
apps/workers/metascraper-plugins/metascraper-reddit.ts:112-152
**Concurrent token refresh race condition**

`getRedditAccessToken` has no concurrency guard. When multiple scrape jobs run in parallel and the cached token has just expired, all of them simultaneously pass the `redditAccessTokenExpiresAt > now` check before any one has written the new token. Each will then issue its own token-refresh request to Reddit's auth endpoint, potentially triggering rate limiting.

The existing URL-level cache in `fetchRedditPostData` avoids this correctly by storing the `Promise` before it resolves. The same pattern should be applied here — store a single in-flight `Promise<string | null>` and return it to all concurrent callers until it resolves.

_{Reviews (1): Last reviewed commit: "feat(workers): implement Reddit OAuth cl..." | Re-trigger Greptile}

…ondition

feat(workers): implement Reddit OAuth client credentials flow to bypa…

a399192

…ss crawler block

greptile-apps Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread apps/workers/metascraper-plugins/metascraper-reddit.ts Outdated

Comment thread apps/workers/metascraper-plugins/metascraper-reddit.ts

fix(workers): address reddit oauth token underflow and refresh race c…

8106dea

…ondition

kunal-rathore-111 mentioned this pull request Jun 14, 2026

[Crawler] Reddit crawling is now getting blocked #2885

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(workers): implement Reddit OAuth client credentials flow to bypa…#2887

feat(workers): implement Reddit OAuth client credentials flow to bypa…#2887
kunal-rathore-111 wants to merge 2 commits into
karakeep-app:mainfrom
kunal-rathore-111:fix/2885-reddit-crawling-auth

kunal-rathore-111 commented Jun 14, 2026

Uh oh!

greptile-apps Bot commented Jun 14, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kunal-rathore-111 commented Jun 14, 2026

Description

How Has This Been Tested?

Screenshots (if appropriate)

Checklist:

Please describe to which degree, if any, an LLM was used in creating this pull request.

Uh oh!

greptile-apps Bot commented Jun 14, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant