Keyword: visual regression testing api · Updated May 28, 2026
Visual Regression Testing with a Screenshot API in 2026
Stop maintaining a fleet of headless browsers, a snapshot baseline store, and a homemade pixel-diff script. The SnapshotFlow /diff endpoint captures two URLs in parallel, runs pixelmatch on them, and returns the changed-pixel count, the diff percentage, and an annotated PNG — in one HTTP call. This guide walks through the call, the GitHub Actions workflow, threshold tuning, dynamic-content handling, the self-host path, and how an AI agent can call the same operation through MCP.
TL;DR — one call
curl "https://api.snapshotflow.com/diff?before=https://myapp.com&after=https://staging.myapp.com&response_type=json" \
-H "X-Api-Key: $SNAPSHOTFLOW_KEY"
# { "changed_pixels": 14320, "diff_percent": 1.40, "has_changes": true, ... }
pixelmatch built in and a base64 diff image option for build artifacts. Pillar reading: What Is a Screenshot API?.
Why visual regression matters in 2026
Unit tests catch broken functions. Integration tests catch broken endpoints. Neither catches the dropdown menu that now overlaps the checkout button, the CSS variable rename that flattened your hero gradient, or the design-system upgrade that nudged every input by 2 px and quietly cut form-submission rate by 6 %.
Those failures are visual. They survive every assertion you can write in code, and they live or die on the diff between two rendered images. That is what visual regression testing checks — and in 2026 it is no longer a luxury; with Tailwind v4, Astro 5, Next.js 16, and AI-generated UI patches landing daily, the rate of unintended visual change has gone up, not down.
The traditional answer is a self-hosted snapshot suite — Percy, Chromatic, BackstopJS, the built-in toHaveScreenshot in Playwright, or a Cypress visual plugin. Each one is powerful, and each one drags along a stack: a baseline image store, a runner with headless Chrome installed, a flaky-test review queue, and a license cost that scales with the number of snapshots. For most teams the diff itself is a five-second pixel-by-pixel comparison; the surrounding plumbing is where the time and money go.
A screenshot-API-backed approach swaps that plumbing for one HTTP endpoint. The browser fleet, the headless-Chrome warm pool, the diff library, and the artifact storage are operated by the API provider. Your CI just makes a call and reads two numbers back.
How /diff works under the hood
/diff is a single GET request. Under the hood:
- Both URLs (
beforeandafter) are pinned to the same viewport (width × height, default 1280 × 800) and rendered in parallel by the warm Puppeteer pool. - The two PNGs are passed to pixelmatch with the supplied
threshold(0–1, default 0.1). - Differing pixels are tinted red on a third PNG — the diff image.
- The endpoint returns one of three response shapes: a raw diff PNG (default), a JSON stats object (
response_type=json), or both combined in base64 (response_type=base64).
The library choice is deliberate — pixelmatch is the same engine used by Percy, BackstopJS, and most Cypress and Playwright visual plugins. The semantics of your threshold and changed_pixels are therefore identical to those tools, so you can migrate without retuning sensitivity.
Your first diff in 30 seconds
Grab a free API key at dashboard.snapshotflow.com (200 diffs/month, no credit card), export it, and run:
export SNAPSHOTFLOW_KEY=sk_live_xxxxxxxxxxxx
# 1. Raw diff image, saved straight to disk
curl "https://api.snapshotflow.com/diff?before=https://stripe.com&after=https://stripe.com/payments" \
-H "X-Api-Key: $SNAPSHOTFLOW_KEY" --output diff.png
# 2. Stats only — what CI usually wants
curl "https://api.snapshotflow.com/diff?before=https://stripe.com&after=https://stripe.com/payments&response_type=json" \
-H "X-Api-Key: $SNAPSHOTFLOW_KEY"
# {
# "before": "https://stripe.com",
# "after": "https://stripe.com/payments",
# "width": 1280, "height": 800,
# "changed_pixels": 134912,
# "total_pixels": 1024000,
# "diff_percent": 13.18,
# "has_changes": true
# }
Two real pages were rendered (no fakes), the diff was computed in <3 s, and you got back a number you can compare against a threshold in any language — no SDK, no test harness, no headless Chrome.
Parameters that matter
| Parameter | Default | When to change it |
|---|---|---|
before / after | — | Required. Production URL vs deploy-preview URL is the most common pairing. |
width / height | 1280 × 800 | Match the viewport(s) your real users see. Run the workflow twice — once at 1280 and once at 390 — for desktop + mobile coverage. |
threshold | 0.1 | Lower = more sensitive. Bump to 0.2 for anti-aliased text and font-rendering noise; drop to 0.05 for marketing pages where pixel-perfect matters. |
response_type | image | Use json in CI for fast gating, base64 when you also want to archive the diff PNG in the same call. |
The full reference (including headers, cookies, and selectors_to_hide shared with /screenshot) lives in the API docs.
Killing false positives
The number-one reason visual regression suites get muted is a flood of false positives. Three sources cover ~90 % of the noise:
- Sub-pixel anti-aliasing. Font rendering differs by a hair between captures even with identical content. Raise
thresholdto0.2. - Volatile UI chrome. Cookie banners, live ad slots, A/B variants, timestamps, "users online" counters. Pass
selectors_to_hidewith the CSS selectors that match them — the renderer appliesdisplay:nonebefore the capture. - Async loaders. Skeletons and spinners that haven't yet resolved produce wildly different frames. Pass
wait_forwith a CSS selector that's only present once the page is genuinely ready (e.g.main[data-ready]).
curl "https://api.snapshotflow.com/diff" \ -H "X-Api-Key: $SNAPSHOTFLOW_KEY" \ --data-urlencode "before=https://myapp.com" \ --data-urlencode "after=https://staging.myapp.com" \ --data-urlencode "threshold=0.2" \ --data-urlencode "selectors_to_hide=#cookie-banner,.ads,[data-experiment]" \ --data-urlencode "wait_for=main[data-ready]" \ --data-urlencode "response_type=json" -G
GitHub Actions workflow (drop-in)
The workflow below runs on every pull request, compares the production URL to the deploy preview, fails the build if diff_percent exceeds 1 %, and uploads the diff PNG as a downloadable artifact.
name: visual-regression
on:
pull_request:
branches: [main]
jobs:
diff:
runs-on: ubuntu-latest
steps:
- name: Run /diff against preview
env:
KEY: ${{ secrets.SNAPSHOTFLOW_KEY }}
BEFORE: https://myapp.com
AFTER: ${{ github.event.pull_request.head.ref == 'main'
&& 'https://myapp.com'
|| format('https://preview-{0}.myapp.com', github.event.number) }}
run: |
curl -sS -G "https://api.snapshotflow.com/diff" \
-H "X-Api-Key: $KEY" \
--data-urlencode "before=$BEFORE" \
--data-urlencode "after=$AFTER" \
--data-urlencode "threshold=0.2" \
--data-urlencode "selectors_to_hide=#cookie-banner,.ads" \
--data-urlencode "response_type=base64" -o diff.json
jq -r '.image_base64' diff.json | base64 -d > diff.png
jq '{diff_percent, changed_pixels, has_changes}' diff.json | tee diff-stats.json
DIFF=$(jq -r .diff_percent diff.json)
echo "diff_percent=$DIFF" >> "$GITHUB_OUTPUT"
awk "BEGIN { exit !($DIFF > 1.0) }" \
&& { echo "::error::Visual regression: $DIFF% > 1%"; exit 1; } \
|| echo "Visual diff $DIFF% is within budget"
- uses: actions/upload-artifact@v4
if: always()
with:
name: visual-diff
path: |
diff.png
diff-stats.json
retention-days: 30
Total wall time on a warm pool: ~5 s. The reviewer opens the failed job's "Artifacts" panel, downloads diff.png, and sees the regressing area highlighted in red.
Self-host inside your VPC
For internal admin panels, staging environments behind a VPN, or data-residency rules, the hosted API can't reach the URLs you want to compare. Drop the Docker Compose stack on a CI runner — same /diff endpoint, no traffic leaves the network.
git clone https://github.com/snapshotflow/snapshotflow.git cd snapshotflow cp .env.example .env # set INTERNAL_KEY=... docker compose up -d # Same call, localhost curl "http://localhost:8080/diff?before=https://internal-app/v1&after=https://internal-app/v2&response_type=json" \ -H "X-Api-Key: $INTERNAL_KEY"
A dedicated cost comparison piece (Self-hosted Screenshot API vs SaaS) is on the roadmap — once published it will be linked here. In the meantime, the pricing explained guide covers per-shot vs per-second models and self-host break-even math.
From an AI agent (MCP)
The same operation is registered as an MCP tool named visual_diff. Any MCP-aware client (Claude Desktop, Cursor, Goose) sees it as a structured function call:
{
"mcpServers": {
"snapshotflow": {
"url": "https://api.snapshotflow.com/mcp",
"headers": { "X-Api-Key": "sk_live_xxxxxxxx" }
}
}
}
Once registered, an agent can be asked in natural language — "Compare today's homepage to last week's archived snapshot and open a Linear ticket if the diff is over 2 %" — and the model will choose to call visual_diff, read diff_percent, and chain into your ticket tool. None of that requires bespoke HTTP plumbing on your side.
vs Percy / BackstopJS / Playwright snapshots
| Tool | What you operate | Where baselines live | AI-agent (MCP) ready | Self-host |
|---|---|---|---|---|
| Percy / Chromatic | SDK in your tests | Vendor cloud | No | No |
| BackstopJS | Local Chromium fleet + JSON config | Repo | No | Yes (DIY) |
Playwright toHaveScreenshot | Playwright runtime + browsers | Repo | No | Yes (DIY) |
SnapshotFlow /diff | One HTTP endpoint | Whichever URL you treat as "before" (prod, S3, git tag) | Yes (visual_diff MCP tool) | Yes (official Docker) |
None of those tools is wrong — they are just different shapes. /diff is the right shape when (a) you don't want a baseline store to maintain, (b) "before" is already a real URL you control, and (c) you want the same diff usable by both CI and an AI agent through one call.
Production checklist
- Pin viewport. Always pass
widthandheightexplicitly — never let a default drift on the server side change your baseline. - Use
response_type=base64in CI. One round-trip gives you both the gating numbers and the artifact PNG. - Set a diff budget. A flat 1 % threshold is a reasonable default; for marketing pages drop to 0.3 %, for dashboard interiors raise to 3 %.
- Mask volatile DOM with
selectors_to_hide— cookie banners, ads, A/B variants, timestamps. - Cache the diff PNG. Upload to S3/R2 keyed by PR number; reviewers can compare the last 5 PRs side-by-side without re-running.
- Handle
504. A page that never reacheswait_fortimes out. Either widenwait_fortoloador fix the target page. - Log
X-Request-Id. Every/diffresponse carries one — quote it when escalating to support and the render is traced in seconds.
FAQ
How is /diff different from Percy or Chromatic?
Percy / Chromatic compare snapshots produced by your own test runner against a baseline image stored in their service. /diff is a self-contained HTTP endpoint that captures both URLs in parallel and returns the comparison synchronously. No SDK, no baseline store, no plan tied to test-suite counts.
Which diff library is used under the hood?
Both URLs are captured with the same warm Puppeteer pool used by /screenshot. The two PNGs are then compared with pixelmatch — the same library used by Percy, BackstopJS and most Cypress / Playwright visual plugins.
How do I handle anti-aliasing and A/B tests?
Raise threshold from 0.1 to 0.2–0.3 for anti-aliasing; mask cookie banners, ads and experimental UI with selectors_to_hide; pin width and height so both renders share a viewport.
Can I run this against URLs behind a VPN?
Yes. Self-host the SnapshotFlow Docker stack inside your VPC or on a GitHub Actions runner. The same /diff endpoint is available on localhost and never sends traffic outside your network.
How do AI agents call this?
Through the visual_diff MCP tool exposed at https://api.snapshotflow.com/mcp. Claude, Cursor, Goose and any MCP-aware client can call it by name with structured arguments and act on diff_percent in their reasoning loop.
Does the free tier cover CI usage?
The 200/month free tier comfortably covers ~6 PRs/day on a single viewport. Beyond that, paid plans bill per call — or self-host for unlimited diffs at flat cost. See pricing explained for worked examples.
Try SnapshotFlow free
200 diffs per month, MCP-ready, Docker self-host available. No credit card.