SEO/GEO improvements: structured data, llms.txt, sitemap, expanded robots by ampedraszewska · Pull Request #1 · freezer3/vastpoint-test

ampedraszewska · 2026-04-29T11:59:19Z

Why

When founders ask LLMs (ChatGPT, Claude, Perplexity, Gemini) "which VC funds invest at seed in Poland / CEE?", Vastpoint doesn't appear in the answers — even though we have strong launch press (MamStartup, MyCompany Polska, Sifted, EU-Startups, Vestbee, AIN.ua).

The reason is simple: vastpoint.vc is a client-side React SPA, so when an LLM crawler (GPTBot, ClaudeBot, PerplexityBot) hits the page, it sees a nearly empty HTML document — only <title>vastpoint ventures</title> and <div id="root"></div>. All the actual content (team, thesis, portfolio) lives behind a JS bundle that crawlers don't execute reliably.

This PR doesn't fix the SPA-rendering issue (that comes in a follow-up — likely react-snap or vite-react-ssg to pre-render static HTML). What it does is give crawlers maximum information from the initial HTML payload right now, with zero functional or visual changes to the live site.

What changed

index.html — expanded <head>:

Rich <title> and <meta name="description"> covering fund stage, sectors, team
Canonical URL
Expanded Open Graph and Twitter Card meta
Two JSON-LD blocks (Organization + WebSite) following schema.org spec, including:
- Founders (Aleksandra, Karolina, Zuzanna) with LinkedIn sameAs links
- Addresses (Warsaw, New York)
- sameAs links to LinkedIn, X, Dealroom, Substack, and press coverage on Vestbee, EU-Startups, Sifted

public/llms.txt (new) — emerging standard for LLM crawlers (markdown they parse natively):

Fund details (size, stage, ticket, geography, sectors, LPs)
Full team bios
Portfolio: Replenit, Howie, Polyvia (with rounds, valuations, customers)
Press links
Contact info

public/sitemap.xml (new) — discovery for crawlers.

public/robots.txt (updated) — explicit Allow: / for the bots that matter:

LLM training: GPTBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Google-Extended, CCBot, cohere-ai, Applebot-Extended, meta-externalagent
LLM retrieval/answering: ChatGPT-User, OAI-SearchBot, Perplexity-User
Existing bots (Googlebot, Bingbot, Twitterbot, facebookexternalhit) preserved

Tests

npm run build passes locally (Vite 5.4, all 1665 modules transformed)
dist/ contains all new files (llms.txt, sitemap.xml, updated robots.txt, expanded index.html)
404.html SPA fallback still generated by existing copy404Plugin
JSON-LD validates against schema.org Organization spec

Test after merge

After GitHub Actions deploys, verify crawler view:

curl -A "GPTBot/1.0" https://vastpoint.vc/ | head -100      # rich head, JSON-LD
curl https://vastpoint.vc/llms.txt                          # full fund profile
curl https://vastpoint.vc/sitemap.xml                       # sitemap
curl https://vastpoint.vc/robots.txt | grep -c GPTBot       # ≥1

Then submit https://vastpoint.vc/sitemap.xml to Google Search Console, and run a Rich Results test on https://search.google.com/test/rich-results — JSON-LD should validate as Organization + WebSite.

What's NOT in this PR

No React component changes
No visual changes
No build pipeline changes (Vite + GitHub Actions deploy unchanged)
No prerender plugin (intentionally separate — bigger change, separate PR)

…bots Goal: make the site discoverable by LLM crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) and improve search engine visibility. The site is a client-side React SPA, so the static HTML LLM crawlers see was nearly empty — these changes give them rich metadata even before the React bundle hydrates. Changes: - index.html: rich title/description, canonical URL, expanded OG/Twitter meta, two JSON-LD blocks (Organization with founders, addresses, sameAs links to LinkedIn/Twitter/Dealroom/Substack/press; WebSite) - public/llms.txt: emerging standard for LLM crawlers, full fund details (size, stage, ticket, sectors), team bios, portfolio (Replenit, Howie, Polyvia), press links, contact - public/sitemap.xml: discovery for crawlers - public/robots.txt: explicit allow for GPTBot, ClaudeBot, anthropic-ai, Claude-Web, PerplexityBot, Perplexity-User, Google-Extended, CCBot, cohere-ai, Applebot-Extended, meta-externalagent, ChatGPT-User, OAI-SearchBot Tests: - npm run build: passes, dist/ contains all new files - 404.html SPA fallback still generated - JSON-LD validates against schema.org Organization spec No code changes, no visual changes, no behavior changes.

Builds on PR #1 (meta + JSON-LD + llms.txt). This finishes the SEO/GEO fix: after `vite build`, a Puppeteer headless browser loads the built SPA, captures the fully-rendered DOM, and writes it back to dist/index.html (and dist/404.html for the GitHub Pages SPA fallback). The result: LLM crawlers (GPTBot, ClaudeBot, PerplexityBot) and search engines that do not reliably execute JS now get the complete Hero component as static HTML — team, thesis, news ticker with the Replenit announcement, contact info, partner logos. The React bundle still hydrates and runs client-side for interactive features (mouse-tracking effect, mobile menu). Local build measurement: - dist/index.html before: 4.96 kB (essentially empty body) - dist/index.html after: 25.7 kB (full rendered Hero) Changes: - scripts/prerender.mjs (new): post-build script using Vite preview server plus Puppeteer to snapshot rendered HTML - package.json: build = "vite build && node scripts/prerender.mjs", added build:nossg as escape hatch, puppeteer as devDependency No changes to React components or routing. Tests: - npm install completes (puppeteer pulls Chromium ~150 MB) - npm run build completes successfully - dist/index.html grep confirms team names plus Replenit present in markup - dist/404.html mirrors index.html (SPA fallback preserved) After merge, GitHub Actions runs the same build pipeline. Verify with: curl -A "GPTBot/1.0" https://vastpoint.vc/ | grep "Pedraszewska" Co-authored-by: Aleksandra Pedraszewska <aleksandra@vastpoint.vc>

ampedraszewska merged commit 08bfc3a into main Apr 29, 2026
2 checks passed

ampedraszewska deleted the geo-improvements branch April 29, 2026 18:24

ampedraszewska mentioned this pull request Apr 29, 2026

Prerender SPA at build time so LLM/search crawlers see the full page #2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SEO/GEO improvements: structured data, llms.txt, sitemap, expanded robots#1

SEO/GEO improvements: structured data, llms.txt, sitemap, expanded robots#1
ampedraszewska merged 1 commit into
mainfrom
geo-improvements

ampedraszewska commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ampedraszewska commented Apr 29, 2026

Why

What changed

Tests

Test after merge

What's NOT in this PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant