A lightweight Go server that generates RSS feeds for blog sites that don't provide their own. It scrapes the blog listing pages, extracts post metadata, and serves standard RSS 2.0 XML.
Running on an exe.dev VM.
| Blog | Feed URL | Source |
|---|---|---|
| MotherDuck Blog | /feed/motherduck |
https://motherduck.com/blog/ |
| Sprites Blog | /feed/sprites |
https://sprites.dev/blog/ |
| Archil Blog | /feed/archil |
https://archil.com/blog/ |
Full URLs: https://rss-feed.exe.xyz:8000/feed/{name}
- When an RSS reader requests a feed, the server checks its in-memory cache (15-minute TTL)
- On cache miss, it fetches the blog's listing page via HTTP
- A site-specific scraper extracts post titles, URLs, descriptions, authors, and dates from the HTML
- The posts are rendered as RSS 2.0 XML using
gorilla/feedsand cached
There is no database. No background jobs. The server only fetches upstream when a reader asks and the cache is stale.
Each blog site has different HTML structure, so each needs its own scraper:
- MotherDuck — Server-rendered HTML, but
<a>tags contain block elements (<div>,<h2>) which Go'snet/htmlparser splits into sibling fragments per the HTML spec. The scraper groups fragments byhrefand collects title/description/date from siblings. - Sprites — Clean server-rendered HTML with
<article>elements,<h3>titles,<time>elements, and<span>authors. Straightforward DOM traversal. - Archil — A Next.js client-rendered app. The HTML contains no visible blog content, but the React Server Components (RSC) payload is embedded in
<script>tags as JSON. The scraper extracts post data by pattern-matching the RSC JSON for"href":"/post/..."entries and their nearby"children"values.
Visit the blog in a browser and inspect the HTML structure. Key questions:
- Is the content server-rendered (visible in
curloutput) or client-rendered (JavaScript-only)? - What elements contain post titles, URLs, dates, descriptions, and authors?
# Quick check: does curl see the blog posts?
curl -sL https://example.com/blog/ | grep -o '<a[^>]*href[^>]*>' | head -20Add a new scraper in scraper/scraper.go:
func scrapeExample(body io.Reader) ([]Post, error) {
doc, err := html.Parse(body)
if err != nil {
return nil, err
}
// Walk the DOM tree, extract posts...
var posts []Post
// ...
return posts, nil
}For client-rendered (React/Next.js) sites, you may need to parse the data from embedded JSON payloads rather than the HTML DOM. See the Archil scraper for an example.
Add an entry to the Sources slice in scraper/scraper.go:
var Sources = []FeedSource{
// ...existing sources...
{
Name: "Example Blog",
SiteURL: "https://example.com/blog/",
Scrape: scrapeExample,
},
}In srv/server.go:
-
Add a route in the
Servemethod:mux.HandleFunc("GET /feed/example", s.handleFeed(3)) // index matches Sources slice
-
Add a card to
indexHTML:<div class="feed-card"> <div class="feed-name">Example Blog <span class="rss-icon">RSS</span></div> <a class="feed-url" href="/feed/example">/feed/example</a> </div>
go build -o rss-feed ./cmd/srv
sudo systemctl restart srvcmd/srv/main.go Entry point — parses flags, starts server
srv/server.go HTTP handlers, caching, RSS generation, index page
scraper/scraper.go Fetch + parse logic for each blog site
srv.service systemd unit file
# Build
go build -o rss-feed ./cmd/srv
# Run directly
./rss-feed # listens on :8000
./rss-feed -listen :3000 # custom port
# Or via systemd (production)
sudo cp srv.service /etc/systemd/system/srv.service
sudo systemctl daemon-reload
sudo systemctl enable --now srv
# Check status / logs
systemctl status srv
journalctl -u srv -fCreate a new VM named rss-feed, then give Shelley this prompt:
Create an RSS feed service in Go that scrapes blog listing pages and serves RSS 2.0 feeds. The blogs to support are:
- https://motherduck.com/blog/ → serve at
/feed/motherduck- https://sprites.dev/blog/ → serve at
/feed/sprites- https://archil.com/blog/ → serve at
/feed/archilRequirements:
- Use the Go project template (
shelley unpack-template go)- Scrape each blog's listing page for post title, URL, description, author, and date
- Generate RSS 2.0 XML using
gorilla/feeds- Cache feeds in memory for 15 minutes
- Cap upstream response reads at 10 MB with
io.LimitReader- Serve an index page at
/listing all available feeds- Run as a systemd service on port 8000
- No database needed
Note: MotherDuck is server-rendered but Go's HTML parser splits
<a>tags containing block elements — group fragments by href. Archil is a Next.js client-rendered app — extract data from the RSC JSON payload in<script>tags. Sprites is clean server-rendered HTML with<article>elements.
After Shelley builds it, run set-public to make the feeds accessible to RSS readers without authentication.
- Read-only service — no database, no user input, no file uploads
- Hardcoded scrape targets — no SSRF risk; the server only fetches from the URLs defined in
Sources - Response size limit — upstream responses are capped at 10 MB via
io.LimitReaderto prevent OOM - Static index page — no user input is interpolated into HTML
- Scraped content is XML-escaped by
gorilla/feedsbefore inclusion in RSS output