Skip to content

SEO/GEO improvements: structured data, llms.txt, sitemap, expanded robots#1

Merged
ampedraszewska merged 1 commit into
mainfrom
geo-improvements
Apr 29, 2026
Merged

SEO/GEO improvements: structured data, llms.txt, sitemap, expanded robots#1
ampedraszewska merged 1 commit into
mainfrom
geo-improvements

Conversation

@ampedraszewska

Copy link
Copy Markdown
Owner

Why

When founders ask LLMs (ChatGPT, Claude, Perplexity, Gemini) "which VC funds invest at seed in Poland / CEE?", Vastpoint doesn't appear in the answers — even though we have strong launch press (MamStartup, MyCompany Polska, Sifted, EU-Startups, Vestbee, AIN.ua).

The reason is simple: vastpoint.vc is a client-side React SPA, so when an LLM crawler (GPTBot, ClaudeBot, PerplexityBot) hits the page, it sees a nearly empty HTML document — only <title>vastpoint ventures</title> and <div id="root"></div>. All the actual content (team, thesis, portfolio) lives behind a JS bundle that crawlers don't execute reliably.

This PR doesn't fix the SPA-rendering issue (that comes in a follow-up — likely react-snap or vite-react-ssg to pre-render static HTML). What it does is give crawlers maximum information from the initial HTML payload right now, with zero functional or visual changes to the live site.

What changed

index.html — expanded <head>:

  • Rich <title> and <meta name="description"> covering fund stage, sectors, team
  • Canonical URL
  • Expanded Open Graph and Twitter Card meta
  • Two JSON-LD blocks (Organization + WebSite) following schema.org spec, including:
    • Founders (Aleksandra, Karolina, Zuzanna) with LinkedIn sameAs links
    • Addresses (Warsaw, New York)
    • sameAs links to LinkedIn, X, Dealroom, Substack, and press coverage on Vestbee, EU-Startups, Sifted

public/llms.txt (new) — emerging standard for LLM crawlers (markdown they parse natively):

  • Fund details (size, stage, ticket, geography, sectors, LPs)
  • Full team bios
  • Portfolio: Replenit, Howie, Polyvia (with rounds, valuations, customers)
  • Press links
  • Contact info

public/sitemap.xml (new) — discovery for crawlers.

public/robots.txt (updated) — explicit Allow: / for the bots that matter:

  • LLM training: GPTBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Google-Extended, CCBot, cohere-ai, Applebot-Extended, meta-externalagent
  • LLM retrieval/answering: ChatGPT-User, OAI-SearchBot, Perplexity-User
  • Existing bots (Googlebot, Bingbot, Twitterbot, facebookexternalhit) preserved

Tests

  • npm run build passes locally (Vite 5.4, all 1665 modules transformed)
  • dist/ contains all new files (llms.txt, sitemap.xml, updated robots.txt, expanded index.html)
  • 404.html SPA fallback still generated by existing copy404Plugin
  • JSON-LD validates against schema.org Organization spec

Test after merge

After GitHub Actions deploys, verify crawler view:

curl -A "GPTBot/1.0" https://vastpoint.vc/ | head -100      # rich head, JSON-LD
curl https://vastpoint.vc/llms.txt                          # full fund profile
curl https://vastpoint.vc/sitemap.xml                       # sitemap
curl https://vastpoint.vc/robots.txt | grep -c GPTBot       # ≥1

Then submit https://vastpoint.vc/sitemap.xml to Google Search Console, and run a Rich Results test on https://search.google.com/test/rich-results — JSON-LD should validate as Organization + WebSite.

What's NOT in this PR

  • No React component changes
  • No visual changes
  • No build pipeline changes (Vite + GitHub Actions deploy unchanged)
  • No prerender plugin (intentionally separate — bigger change, separate PR)

…bots

Goal: make the site discoverable by LLM crawlers (GPTBot, ClaudeBot,
PerplexityBot, etc.) and improve search engine visibility. The site is
a client-side React SPA, so the static HTML LLM crawlers see was nearly
empty — these changes give them rich metadata even before the React
bundle hydrates.

Changes:
- index.html: rich title/description, canonical URL, expanded OG/Twitter
  meta, two JSON-LD blocks (Organization with founders, addresses,
  sameAs links to LinkedIn/Twitter/Dealroom/Substack/press; WebSite)
- public/llms.txt: emerging standard for LLM crawlers, full fund
  details (size, stage, ticket, sectors), team bios, portfolio
  (Replenit, Howie, Polyvia), press links, contact
- public/sitemap.xml: discovery for crawlers
- public/robots.txt: explicit allow for GPTBot, ClaudeBot, anthropic-ai,
  Claude-Web, PerplexityBot, Perplexity-User, Google-Extended, CCBot,
  cohere-ai, Applebot-Extended, meta-externalagent, ChatGPT-User,
  OAI-SearchBot

Tests:
- npm run build: passes, dist/ contains all new files
- 404.html SPA fallback still generated
- JSON-LD validates against schema.org Organization spec

No code changes, no visual changes, no behavior changes.
@ampedraszewska ampedraszewska merged commit 08bfc3a into main Apr 29, 2026
2 checks passed
@ampedraszewska ampedraszewska deleted the geo-improvements branch April 29, 2026 18:24
ampedraszewska added a commit that referenced this pull request Apr 29, 2026
Builds on PR #1 (meta + JSON-LD + llms.txt). This finishes the SEO/GEO
fix: after `vite build`, a Puppeteer headless browser loads the built
SPA, captures the fully-rendered DOM, and writes it back to
dist/index.html (and dist/404.html for the GitHub Pages SPA fallback).

The result: LLM crawlers (GPTBot, ClaudeBot, PerplexityBot) and search
engines that do not reliably execute JS now get the complete Hero
component as static HTML — team, thesis, news ticker with the Replenit
announcement, contact info, partner logos. The React bundle still
hydrates and runs client-side for interactive features (mouse-tracking
effect, mobile menu).

Local build measurement:
- dist/index.html before: 4.96 kB (essentially empty body)
- dist/index.html after:  25.7 kB (full rendered Hero)

Changes:
- scripts/prerender.mjs (new): post-build script using Vite preview
  server plus Puppeteer to snapshot rendered HTML
- package.json: build = "vite build && node scripts/prerender.mjs",
  added build:nossg as escape hatch, puppeteer as devDependency

No changes to React components or routing.

Tests:
- npm install completes (puppeteer pulls Chromium ~150 MB)
- npm run build completes successfully
- dist/index.html grep confirms team names plus Replenit present in markup
- dist/404.html mirrors index.html (SPA fallback preserved)

After merge, GitHub Actions runs the same build pipeline. Verify with:
    curl -A "GPTBot/1.0" https://vastpoint.vc/ | grep "Pedraszewska"

Co-authored-by: Aleksandra Pedraszewska <aleksandra@vastpoint.vc>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant