rss-cli-agent is a small daily pipeline for collecting paper candidates from
journal RSS feeds and Google Scholar alert emails, screening titles with an AI
model, and exporting the selected entries as JSON.
The project is intentionally narrow. It keeps only the fields needed by the downstream literature workflow: title, DOI, timestamp, URL, and pipeline state.
sync_scholar.pyreads Google Scholar alert emails from Gmail and writes candidate entries into the temporary source cache.refresh_feeds.pyrefreshes configured journal RSS feeds intostorage/db/source_cache.sqlite.sync_entries.pycompares the temporary source cache withstorage/db/rss_entries.sqliteand inserts or updates changed entries.filter_titles.pyreads entries withstate = "pending_filter", writes AI decisions tostorage/db/title_filtered.sqlite, and updates the persistent RSS entry state.export_daily.pyexports selected entries tostorage/exports.
There is no Crossref stage, metadata-cleaning stage, or missing_doi state.
DOI is best-effort; entries without DOI still go through title filtering and
export.
pending_filter: entry is waiting for AI title filtering. If the AI call fails, the entry stays here and is retried on the next run.selected: entry passed title filtering and is ready for export.filtered_out: entry was rejected by title filtering.exported: selected entry has been included in a daily export.
Tracked daily state:
storage/db/rss_entries.sqlitestorage/db/title_filtered.sqlitestorage/exports/*.selected.jsonstorage/exports/*.manifest.json
Temporary or local-only files:
storage/db/source_cache.sqliteconfig/settings.tomlconfig/cache/log/
source_cache.sqlite is a per-run comparison cache. It should not be kept as a
long-term database.
Local runs use config/settings.toml. Do not use .env for this project.
The expected settings sections are:
[paths][deepseek][ai_title_filter][google_scholar_alerts][google_oauth][rss_feeds]
For GitHub Actions, configure these repository secrets:
DEEPSEEK_API_KEYGWS_CLIENT_SECRET_JSON_B64GWS_TOKEN_JSON_B64
The two GWS secrets are base64-encoded JSON files for client_secret.json and
token.json.
Use the wrapper for normal daily runs:
pwsh -NoProfile -ExecutionPolicy Bypass -File tools\run-daily-rss-gmail.ps1For CI-style local validation in the current terminal:
pwsh -NoProfile -ExecutionPolicy Bypass -File tools\run-daily-rss-gmail.ps1 -RunInCurrentWindowPipeline modules should be run with:
--settings-toml config/settings.tomluv run ruff check .
uv run ty check .
uv run python -m pytestCurrent release line: v1.2.0.
License: MIT.