Use Case · May 28, 2026

Visual Regression Testing with a Screenshot API in 2026

Q: How is /diff different from Percy or Chromatic?

Percy / Chromatic compare snapshots produced by your own test runner against a baseline image stored in their service. /diff is a self-contained HTTP endpoint that captures both URLs in parallel and returns the comparison synchronously. No SDK, no baseline store, no plan tied to test-suite counts.

Q: Which diff library is used under the hood?

Both URLs are captured with the same warm Puppeteer pool used by /screenshot. The two PNGs are then compared with pixelmatch — the same library used by Percy, BackstopJS and most Cypress / Playwright visual plugins.

Q: How do I handle anti-aliasing and A/B tests?

Raise threshold from 0.1 to 0.2–0.3 for anti-aliasing. Suppress volatile DOM (cookie banners, ads, A/B variants) in the target page's test/preview mode before calling /diff — the hosted endpoint accepts only before, after, width, height, threshold, and response_type. Always pin width and height so both renders share the same viewport.

Q: Can I run this against URLs behind a VPN?

Yes. Self-host the SnapshotFlow Docker stack inside your VPC or on a GitHub Actions runner. The same /diff endpoint is available on localhost and never sends traffic outside your network.

Q: How do AI agents call this?

Through the visual_diff MCP tool exposed at https://api.snapshotflow.com/mcp. Claude, Cursor, Goose and any MCP-aware client can call it by name with structured arguments and act on diff_percent in their reasoning loop.

Q: Does the free tier cover CI usage?

The free tier is 300 screenshots for the lifetime of the account. Since /diff charges 2 units (one per render), that is 150 diff calls total — enough for a small project or evaluation. Paid plans bill per call with no lifetime cap; self-hosting gives unlimited diffs at flat infrastructure cost.

Stop maintaining a fleet of headless browsers, a snapshot baseline store, and a homemade pixel-diff script. The SnapshotFlow /diff endpoint captures two URLs in parallel, runs pixelmatch on them, and returns the changed-pixel count, the diff percentage, and an annotated PNG — in one HTTP call. This guide walks through the call, the GitHub Actions workflow, threshold tuning, dynamic-content handling, the self-host path, and how an AI agent can call the same operation through MCP.

TL;DR — one call

curl "https://api.snapshotflow.com/diff?before=https://myapp.com&after=https://staging.myapp.com&response_type=json" \
  -H "X-Api-Key: $SNAPSHOTFLOW_KEY"

# { "changed_pixels": 14320, "diff_percent": 1.40, "has_changes": true, ... }

What you skip: installing Puppeteer/Playwright, running a baseline store, building a diff renderer, paying per-snapshot like Percy, and keeping a CI image with Chromium fonts. What you get: a stable HTTP endpoint with pixelmatch built in and a base64 diff image option for build artifacts. Pillar reading: What Is a Screenshot API?.

Why visual regression matters in 2026

Unit tests catch broken functions. Integration tests catch broken endpoints. Neither catches the dropdown menu that now overlaps the checkout button, the CSS variable rename that flattened your hero gradient, or the design-system upgrade that nudged every input by 2 px and quietly cut form-submission rate by 6 %.

Those failures are visual. They survive every assertion you can write in code, and they live or die on the diff between two rendered images. That is what visual regression testing checks — and in 2026 it is no longer a luxury; with Tailwind v4, Astro 5, Next.js 16, and AI-generated UI patches landing daily, the rate of unintended visual change has gone up, not down.

The traditional answer is a self-hosted snapshot suite — Percy, Chromatic, BackstopJS, the built-in toHaveScreenshot in Playwright, or a Cypress visual plugin. Each one is powerful, and each one drags along a stack: a baseline image store, a runner with headless Chrome installed, a flaky-test review queue, and a license cost that scales with the number of snapshots. For most teams the diff itself is a five-second pixel-by-pixel comparison; the surrounding plumbing is where the time and money go.

A screenshot-API-backed approach swaps that plumbing for one HTTP endpoint. The browser fleet, the headless-Chrome warm pool, the diff library, and the artifact storage are operated by the API provider. Your CI just makes a call and reads two numbers back.

How `/diff` works under the hood

/diff is a single GET request. Under the hood:

Both URLs (before and after) are pinned to the same viewport (width × height, default 1280 × 800) and rendered in parallel by the warm Puppeteer pool.
The two PNGs are passed to pixelmatch with the supplied threshold (0–1, default 0.1).
Differing pixels are tinted red on a third PNG — the diff image.
The endpoint returns one of three response shapes: a raw diff PNG (default), a JSON stats object (response_type=json), or both combined in base64 (response_type=base64).

The library choice is deliberate — pixelmatch is the same engine used by Percy, BackstopJS, and most Cypress and Playwright visual plugins. The semantics of your threshold and changed_pixels are therefore identical to those tools, so you can migrate without retuning sensitivity.

Your first diff in 30 seconds

Grab a free API key at dashboard.snapshotflow.com (free tier: 300 screenshots lifetime, /diff costs 2 units = 150 diff calls; no credit card), export it, and run:

export SNAPSHOTFLOW_KEY=sk_live_xxxxxxxxxxxx

# 1. Raw diff image, saved straight to disk
curl "https://api.snapshotflow.com/diff?before=https://stripe.com&after=https://stripe.com/payments" \
  -H "X-Api-Key: $SNAPSHOTFLOW_KEY" --output diff.png

# 2. Stats only — what CI usually wants
curl "https://api.snapshotflow.com/diff?before=https://stripe.com&after=https://stripe.com/payments&response_type=json" \
  -H "X-Api-Key: $SNAPSHOTFLOW_KEY"
# {
#   "before": "https://stripe.com",
#   "after":  "https://stripe.com/payments",
#   "width": 1280, "height": 800,
#   "changed_pixels": 134912,
#   "total_pixels":  1024000,
#   "diff_percent":  13.18,
#   "has_changes":   true
# }

Two real pages were rendered (no fakes), the diff was computed in <3 s, and you got back a number you can compare against a threshold in any language — no SDK, no test harness, no headless Chrome.

Parameters that matter

Parameter	Default	When to change it
`before` / `after`	—	Required. Production URL vs deploy-preview URL is the most common pairing.
`width` / `height`	1280 × 800	Match the viewport(s) your real users see. Run the workflow twice — once at 1280 and once at 390 — for desktop + mobile coverage.
`threshold`	0.1	Lower = more sensitive. Bump to `0.2` for anti-aliased text and font-rendering noise; drop to `0.05` for marketing pages where pixel-perfect matters.
`response_type`	`image`	Use `json` in CI for fast gating, `base64` when you also want to archive the diff PNG in the same call.

/diff accepts exactly six parameters: before, after, width, height, threshold, and response_type. Full reference in the API docs.

Killing false positives

The number-one reason visual regression suites get muted is a flood of false positives. Three sources cover ~90 % of the noise:

Sub-pixel anti-aliasing. Font rendering differs by a hair between captures even with identical content. Raise threshold to 0.2.
Volatile UI chrome. Cookie banners, live ad slots, A/B variants, timestamps, "users online" counters. The cleanest fix is to suppress these server-side: add a ?test=1 query param your app reads to hide them, or set a preview-mode cookie before the deploy preview URL is handed to /diff.
Async loaders. Skeletons and spinners produce wildly different frames. Ensure the preview URL is fully resolved before handing it to /diff: wait for the deploy to finish, or pass a URL that only responds once the SPA has hydrated (e.g. a smoke-test endpoint that returns 200 only when main[data-ready] is present).

# Threshold is the one /diff knob for noise reduction:
curl "https://api.snapshotflow.com/diff" \
  -H "X-Api-Key: $SNAPSHOTFLOW_KEY" \
  --data-urlencode "before=https://myapp.com" \
  --data-urlencode "after=https://staging.myapp.com?test=1" \
  --data-urlencode "threshold=0.2" \
  --data-urlencode "response_type=json" -G

GitHub Actions workflow (drop-in)

The workflow below runs on every pull request, compares the production URL to the deploy preview, fails the build if diff_percent exceeds 1 %, and uploads the diff PNG as a downloadable artifact.

name: visual-regression
on:
  pull_request:
    branches: [main]

jobs:
  diff:
    runs-on: ubuntu-latest
    steps:
      - name: Run /diff against preview
        env:
          KEY: ${{ secrets.SNAPSHOTFLOW_KEY }}
          BEFORE: https://myapp.com
          AFTER:  ${{ github.event.pull_request.head.ref == 'main'
                      && 'https://myapp.com'
                      || format('https://preview-{0}.myapp.com', github.event.number) }}
        run: |
          curl -sS -G "https://api.snapshotflow.com/diff" \
            -H "X-Api-Key: $KEY" \
            --data-urlencode "before=$BEFORE" \
            --data-urlencode "after=$AFTER" \
            --data-urlencode "threshold=0.2" \
            --data-urlencode "response_type=base64" -o diff.json

          # .image is a data URL: "data:image/png;base64,"
          jq -r '.image' diff.json | sed 's/^data:image\/png;base64,//' | base64 -d > diff.png
          jq '{diff_percent, changed_pixels, has_changes}' diff.json | tee diff-stats.json

          DIFF=$(jq -r .diff_percent diff.json)
          echo "diff_percent=$DIFF" >> "$GITHUB_OUTPUT"
          awk "BEGIN { exit !($DIFF > 1.0) }" \
            && { echo "::error::Visual regression: $DIFF% > 1%"; exit 1; } \
            || echo "Visual diff $DIFF% is within budget"

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: visual-diff
          path: |
            diff.png
            diff-stats.json
          retention-days: 30

Total wall time on a warm pool: ~5 s. The reviewer opens the failed job's "Artifacts" panel, downloads diff.png, and sees the regressing area highlighted in red.

Self-host inside your VPC

For internal admin panels, staging environments behind a VPN, or data-residency rules, the hosted API can't reach the URLs you want to compare. Drop the Docker Compose stack on a CI runner — same /diff endpoint, no traffic leaves the network.

git clone https://github.com/snapshotflow/snapshotflow.git
cd snapshotflow
cp .env.example .env  # set INTERNAL_KEY=...
docker compose up -d

# Same call, localhost
curl "http://localhost:8080/diff?before=https://internal-app/v1&after=https://internal-app/v2&response_type=json" \
  -H "X-Api-Key: $INTERNAL_KEY"

A dedicated cost comparison piece (Self-hosted Screenshot API vs SaaS) is on the roadmap — once published it will be linked here. In the meantime, the pricing explained guide covers per-shot vs per-second models and self-host break-even math.

From an AI agent (MCP)

The same operation is registered as an MCP tool named visual_diff. Any MCP-aware client (Claude Desktop, Cursor, Goose) sees it as a structured function call:

{
  "mcpServers": {
    "snapshotflow": {
      "url": "https://api.snapshotflow.com/mcp",
      "headers": { "X-Api-Key": "sk_live_xxxxxxxx" }
    }
  }
}

Once registered, an agent can be asked in natural language — "Compare today's homepage to last week's archived snapshot and open a Linear ticket if the diff is over 2 %" — and the model will choose to call visual_diff, read diff_percent, and chain into your ticket tool. None of that requires bespoke HTTP plumbing on your side.

Why this matters: visual regression is the first QA task where AI agents materially outperform humans — the model is patient, doesn't blink past tiny shifts, and is happy to triage a hundred diffs at 3 a.m. The endpoint is the same; the surface (HTTP vs MCP) is up to you.

vs Percy / BackstopJS / Playwright snapshots

Tool	What you operate	Where baselines live	AI-agent (MCP) ready	Self-host
Percy / Chromatic	SDK in your tests	Vendor cloud	No	No
BackstopJS	Local Chromium fleet + JSON config	Repo	No	Yes (DIY)
Playwright `toHaveScreenshot`	Playwright runtime + browsers	Repo	No	Yes (DIY)
SnapshotFlow `/diff`	One HTTP endpoint	Whichever URL you treat as "before" (prod, S3, git tag)	Yes (`visual_diff` MCP tool)	Yes (official Docker)

None of those tools is wrong — they are just different shapes. /diff is the right shape when (a) you don't want a baseline store to maintain, (b) "before" is already a real URL you control, and (c) you want the same diff usable by both CI and an AI agent through one call.

Production checklist

Pin viewport. Always pass width and height explicitly — never let a default drift on the server side change your baseline.
Use response_type=base64 in CI. One round-trip gives you both the gating numbers and the artifact PNG.
Set a diff budget. A flat 1 % threshold is a reasonable default; for marketing pages drop to 0.3 %, for dashboard interiors raise to 3 %.
Suppress volatile DOM server-side. Cookie banners, ads, A/B variants, timestamps — hide them via a ?test=1 query param or a preview-mode cookie on the target page before calling /diff.
Cache the diff PNG. Upload to S3/R2 keyed by PR number; reviewers can compare the last 5 PRs side-by-side without re-running.
Handle 504. A page that never fully loads will time out. Ensure the URL is genuinely ready before calling /diff, or use a lighter wait_until=domcontentloaded equivalent by ensuring the page is pre-warmed.
Log the raw response on failure. When contacting support, include the full JSON response body and the exact URLs passed as before and after.

FAQ

How is `/diff` different from Percy or Chromatic?

Percy / Chromatic compare snapshots produced by your own test runner against a baseline image stored in their service. /diff is a self-contained HTTP endpoint that captures both URLs in parallel and returns the comparison synchronously. No SDK, no baseline store, no plan tied to test-suite counts.

Which diff library is used under the hood?

Both URLs are captured with the same warm Puppeteer pool used by /screenshot. The two PNGs are then compared with pixelmatch — the same library used by Percy, BackstopJS and most Cypress / Playwright visual plugins.

How do I handle anti-aliasing and A/B tests?

Raise threshold from 0.1 to 0.2–0.3 for anti-aliasing. For volatile DOM (cookie banners, ads, A/B variants), suppress them in the target page's test/preview mode before calling /diff — the hosted endpoint accepts only before, after, width, height, threshold, and response_type. Always pin width and height so both renders share the same viewport.

Can I run this against URLs behind a VPN?

Yes. Self-host the SnapshotFlow Docker stack inside your VPC or on a GitHub Actions runner. The same /diff endpoint is available on localhost and never sends traffic outside your network.

How do AI agents call this?

Through the visual_diff MCP tool exposed at https://api.snapshotflow.com/mcp. Claude, Cursor, Goose and any MCP-aware client can call it by name with structured arguments and act on diff_percent in their reasoning loop.

Does the free tier cover CI usage?

The free tier is 300 screenshots for the lifetime of the account (no monthly reset). Since /diff charges 2 units — one per render — that is 150 diff calls total. Enough for a small project or evaluation. Paid plans bill per call with no lifetime cap; self-hosting gives unlimited diffs at flat infrastructure cost. See pricing explained for worked examples.

Try SnapshotFlow free

300 screenshots free (lifetime), MCP-ready, Docker self-host available. No credit card.

Create free account Read /diff docs

Visual Regression Testing with a Screenshot API in 2026

TL;DR — one call

Why visual regression matters in 2026

How /diff works under the hood

Your first diff in 30 seconds

Parameters that matter

Killing false positives

GitHub Actions workflow (drop-in)

Self-host inside your VPC

From an AI agent (MCP)

vs Percy / BackstopJS / Playwright snapshots

Production checklist

What to read next

FAQ

How is /diff different from Percy or Chromatic?

Which diff library is used under the hood?

How do I handle anti-aliasing and A/B tests?

Can I run this against URLs behind a VPN?

How do AI agents call this?

Does the free tier cover CI usage?

Try SnapshotFlow free

How `/diff` works under the hood

How is `/diff` different from Percy or Chromatic?