Day 7: MCP tool extract_text — 10x cheaper than screenshot for LLM context

✍️ What shipped

GET /extract and a new extract_text MCP tool. Both use the existing Puppeteer browser pool but skip the screenshot step entirely — services/extract.ts:extractContent() renders the page and returns content in the requested format. Zod-validated params, SSRF protection, and quota charge all in place.

# Get page content as markdown
curl "https://api.snapshotflow.com/extract?url=https://stripe.com&format=markdown&max_chars=4000" \
-H "X-Api-Key: $KEY"

📊 Numbers

11 tests added (__tests__/extract.test.ts): 3 formats, SSRF, quota, edge cases — all passing
First user (me): asked Claude “what does stripe.com say” — got clean markdown in <1 s, no PNG in context

🤯 What went wrong / lesson

Should’ve shipped this with the very first MCP server. Every agent call to capture_screenshot that only needed text was paying 100x the token cost and waiting 5–10x longer. The lesson: always ask “does the agent actually need a pixel, or just the text?”

⏭️ Next

Day 8: extract_metadata MCP tool — title, Open Graph tags, and favicon in ~200 ms, no full page render needed.

Day 7: MCP tool `extract_text` — 10x cheaper than screenshot for LLM context

✍️ What shipped

📊 Numbers

🤯 What went wrong / lesson

⏭️ Next

Try extract_text now