Skip to content

kylelundstedt/rss-feed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RSS Feed Service

A lightweight Go server that generates RSS feeds for blog sites that don't provide their own. It scrapes the blog listing pages, extracts post metadata, and serves standard RSS 2.0 XML.

Running on an exe.dev VM.

Feeds

Blog Feed URL Source
MotherDuck Blog /feed/motherduck https://motherduck.com/blog/
Sprites Blog /feed/sprites https://sprites.dev/blog/
Archil Blog /feed/archil https://archil.com/blog/

Full URLs: https://rss-feed.exe.xyz:8000/feed/{name}

How it works

  1. When an RSS reader requests a feed, the server checks its in-memory cache (15-minute TTL)
  2. On cache miss, it fetches the blog's listing page via HTTP
  3. A site-specific scraper extracts post titles, URLs, descriptions, authors, and dates from the HTML
  4. The posts are rendered as RSS 2.0 XML using gorilla/feeds and cached

There is no database. No background jobs. The server only fetches upstream when a reader asks and the cache is stale.

Scraper strategies

Each blog site has different HTML structure, so each needs its own scraper:

  • MotherDuck — Server-rendered HTML, but <a> tags contain block elements (<div>, <h2>) which Go's net/html parser splits into sibling fragments per the HTML spec. The scraper groups fragments by href and collects title/description/date from siblings.
  • Sprites — Clean server-rendered HTML with <article> elements, <h3> titles, <time> elements, and <span> authors. Straightforward DOM traversal.
  • Archil — A Next.js client-rendered app. The HTML contains no visible blog content, but the React Server Components (RSC) payload is embedded in <script> tags as JSON. The scraper extracts post data by pattern-matching the RSC JSON for "href":"/post/..." entries and their nearby "children" values.

Adding a new feed

1. Inspect the blog page

Visit the blog in a browser and inspect the HTML structure. Key questions:

  • Is the content server-rendered (visible in curl output) or client-rendered (JavaScript-only)?
  • What elements contain post titles, URLs, dates, descriptions, and authors?
# Quick check: does curl see the blog posts?
curl -sL https://example.com/blog/ | grep -o '<a[^>]*href[^>]*>' | head -20

2. Write a scraper function

Add a new scraper in scraper/scraper.go:

func scrapeExample(body io.Reader) ([]Post, error) {
    doc, err := html.Parse(body)
    if err != nil {
        return nil, err
    }
    // Walk the DOM tree, extract posts...
    var posts []Post
    // ...
    return posts, nil
}

For client-rendered (React/Next.js) sites, you may need to parse the data from embedded JSON payloads rather than the HTML DOM. See the Archil scraper for an example.

3. Register the source

Add an entry to the Sources slice in scraper/scraper.go:

var Sources = []FeedSource{
    // ...existing sources...
    {
        Name:    "Example Blog",
        SiteURL: "https://example.com/blog/",
        Scrape:  scrapeExample,
    },
}

4. Add the route and index card

In srv/server.go:

  1. Add a route in the Serve method:

    mux.HandleFunc("GET /feed/example", s.handleFeed(3)) // index matches Sources slice
  2. Add a card to indexHTML:

    <div class="feed-card">
      <div class="feed-name">Example Blog <span class="rss-icon">RSS</span></div>
      <a class="feed-url" href="/feed/example">/feed/example</a>
    </div>

5. Build and deploy

go build -o rss-feed ./cmd/srv
sudo systemctl restart srv

Project structure

cmd/srv/main.go      Entry point — parses flags, starts server
srv/server.go        HTTP handlers, caching, RSS generation, index page
scraper/scraper.go   Fetch + parse logic for each blog site
srv.service          systemd unit file

Building and running

# Build
go build -o rss-feed ./cmd/srv

# Run directly
./rss-feed                    # listens on :8000
./rss-feed -listen :3000      # custom port

# Or via systemd (production)
sudo cp srv.service /etc/systemd/system/srv.service
sudo systemctl daemon-reload
sudo systemctl enable --now srv

# Check status / logs
systemctl status srv
journalctl -u srv -f

Recreating this VM on exe.dev

Create a new VM named rss-feed, then give Shelley this prompt:

Create an RSS feed service in Go that scrapes blog listing pages and serves RSS 2.0 feeds. The blogs to support are:

Requirements:

  • Use the Go project template (shelley unpack-template go)
  • Scrape each blog's listing page for post title, URL, description, author, and date
  • Generate RSS 2.0 XML using gorilla/feeds
  • Cache feeds in memory for 15 minutes
  • Cap upstream response reads at 10 MB with io.LimitReader
  • Serve an index page at / listing all available feeds
  • Run as a systemd service on port 8000
  • No database needed

Note: MotherDuck is server-rendered but Go's HTML parser splits <a> tags containing block elements — group fragments by href. Archil is a Next.js client-rendered app — extract data from the RSC JSON payload in <script> tags. Sprites is clean server-rendered HTML with <article> elements.

After Shelley builds it, run set-public to make the feeds accessible to RSS readers without authentication.

Security notes

  • Read-only service — no database, no user input, no file uploads
  • Hardcoded scrape targets — no SSRF risk; the server only fetches from the URLs defined in Sources
  • Response size limit — upstream responses are capped at 10 MB via io.LimitReader to prevent OOM
  • Static index page — no user input is interpolated into HTML
  • Scraped content is XML-escaped by gorilla/feeds before inclusion in RSS output

About

RSS feed service for blogs that don't provide their own

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors