fix: allow GitHub scraper to work without a token#2264
Conversation
Public repository scraping can run without a token, but unauthenticated GitHub clients only have a 60-request core limit. A fixed 100-request pause threshold caused tokenless scrapes to pause immediately even when requests remained. Compute the pause threshold from the reported core limit with min and max bounds, and skip pausing when rate-limit data is missing.
BenchstatBase: 📊 1 minor regression(s) (all within 5% threshold)
✅ 2 improvement(s)
Full benchstat output |
WalkthroughGitHub rate-limit pause checks now compute their threshold from the API limit, clamp it to ChangesGitHub rate-limit pause logic
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scrapers/github/client.go`:
- Around line 184-186: The pause check in the GitHub client rate-limit logic
returns a possibly negative wait duration from time.Until(core.Reset.Time),
which can make downstream reset timing invalid. Update the logic in the
rate-limit helper that compares core.Remaining against threshold so it only
returns shouldPause=true when the computed waitDuration is positive; if the
duration is zero or negative, return no pause instead and avoid passing a past
reset time onward.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 73098b4a-5427-449b-a63e-50875e2adaaf
📒 Files selected for processing (2)
scrapers/github/client.goscrapers/github/client_test.go
Public GitHub repository scraping can run without a token, but unauthenticated clients only get 60 core requests/hour.
The scraper used a fixed 100-request pause threshold, so tokenless clients were treated as rate-limited immediately.
Compute the pause threshold from the actual core limit, clamped between 1 and 100, and handle missing rate-limit data defensively.
Summary by CodeRabbit
Bug Fixes
Tests