A closed-loop design enforcement system that catches and prevents design drift during every agent coding session — verification, QA, lint, and drift detection running continuously
Daily Agentic Development Sessions
You're building features daily with Claude Code or Cursor. Each session needs to start with design context, end with visual verification, and have enforcement running throughout. This blueprint creates that loop.
Codex / Autonomous Agent Dispatches
You dispatch work to autonomous agents (Codex, Devin, etc.) that build without human supervision. Visual regression and lint gates become mandatory CI checks that catch drift before merge.
Post-Drift Recovery
Your site already has design drift from prior sessions. This blueprint shows how to inventory inconsistencies, define the canonical version, sweep the codebase, and verify the fix — the standard retrospective audit process.
Set Up the Screenshot-Driven QA Loop
Install Frontend Review MCP — the visual verification layer that lets agents self-check their work. The workflow: agent makes UI changes → captures before/after screenshots → Frontend Review MCP compares them → agent gets 'yes' (approved) or 'no' (with explanation) → agent refines. This creates a closed verification loop that catches design drift at the moment it happens, not after merge. For more advanced iterative refinement, use Playwright MCP + Pixelmatch to generate visual diff images that feed directly back to the agent as context.
// 1. Install Frontend Review MCP
// In your Claude Code config or Cursor settings:
{
"mcpServers": {
"frontend-review": {
"command": "npx",
"args": ["-y", "@anthropic-ai/frontend-review-mcp"],
"env": {
"REVIEW_MODEL": "Qwen/Qwen2-VL-72B-Instruct"
}
},
"browser-tools": {
"command": "npx",
"args": ["-y", "@anthropic-ai/browser-tools-mcp"]
}
}
}
// 2. Playwright + Pixelmatch iterative loop
// scripts/visual-diff.ts
import { chromium } from "playwright";
import { PNG } from "pngjs";
import pixelmatch from "pixelmatch";
import { readFileSync, writeFileSync } from "fs";
async function captureAndDiff(url: string, baselinePath: string) {
const browser = await chromium.launch();
const page = await browser.newPage({ viewport: { width: 1280, height: 720 } });
await page.goto(url);
await page.waitForLoadState("networkidle");
const currentBuffer = await page.screenshot({ fullPage: true });
writeFileSync("current.png", currentBuffer);
const baseline = PNG.sync.read(readFileSync(baselinePath));
const current = PNG.sync.read(currentBuffer);
const diff = new PNG({ width: baseline.width, height: baseline.height });
const numDiffPixels = pixelmatch(
baseline.data, current.data, diff.data,
baseline.width, baseline.height,
{ threshold: 0.1 }
);
writeFileSync("diff.png", PNG.sync.write(diff));
const totalPixels = baseline.width * baseline.height;
const diffPercent = ((numDiffPixels / totalPixels) * 100).toFixed(2);
console.log(`Diff: ${diffPercent}% (${numDiffPixels} pixels)`);
await browser.close();
return parseFloat(diffPercent);
}
// Usage: agent runs this, sees diff%, refines code, runs again
// Loop until diff < 1%Add Design System MCP for Live Component Context
Set up Storybook MCP so your agent has live access to your component catalog during coding. This is how monday.com achieved code that 'looks like someone who deeply understands the system wrote it.' The MCP exposes component lists, prop types with defaults, example code from stories, and documentation — all as structured JSON the agent queries in real-time.
// .storybook/main.ts — enable Component Manifest + MCP
const config = {
addons: ["@storybook/addon-mcp"],
experimentalComponentsManifest: true,
};
// Claude Code MCP config:
{
"mcpServers": {
"storybook": {
"command": "npx",
"args": ["storybook", "mcp", "--config-dir", ".storybook"]
}
}
}
// Now your agent can query:
// "What props does the Card component accept?"
// "Show me the Button component variants"
// "What's the correct import path for PageLayout?"
// For Figma-connected workflows, add Figma MCP:
{
"mcpServers": {
"figma": {
"command": "npx",
"args": ["-y", "@anthropic-ai/figma-mcp"],
"env": { "FIGMA_ACCESS_TOKEN": "your-token" }
}
}
}Implement CI/CD Design Gates
Add automated enforcement that blocks merges when design drift is detected. Three gates: (1) ESLint custom rules ban raw values, (2) Stylelint detects non-token CSS values, (3) Playwright visual regression catches pixel-level drift. Builds fail on violations — this is the 'wash your hands' of design consistency.
// .github/workflows/design-gate.yml
name: Design Consistency Gate
on: [pull_request]
jobs:
lint-design:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20" }
- run: npm ci
# Gate 1: No hardcoded design values
- name: Lint design tokens
run: npx eslint --rule 'no-hardcoded-design-values: error' 'src/**/*.{ts,tsx}'
# Gate 2: No non-token CSS values
- name: Lint CSS
run: npx stylelint 'src/**/*.css'
# Gate 3: Visual regression
- name: Install Playwright
run: npx playwright install --with-deps chromium
- name: Start dev server
run: npm run dev &
env: { PORT: "3000" }
- name: Wait for server
run: npx wait-on http://localhost:3000 --timeout 30000
- name: Run visual regression
run: npx playwright test tests/visual-regression.spec.ts
- name: Upload diff artifacts
if: failure()
uses: actions/upload-artifact@v4
with:
name: visual-diffs
path: tests/visual-regression.spec.ts-snapshots/
# For Percy (premium):
# - name: Percy snapshot
# run: npx percy exec -- npx playwright test
# env: { PERCY_TOKEN: ${{ secrets.PERCY_TOKEN }} }Create a Session-Start Design Checklist
Every agent session should start by loading design context. Create a hook or checklist that runs before any UI work. This is the 'context engineering' layer — making sure the agent has the right information before it starts coding.
// scripts/design-preflight.ts
// Run at the start of every agent session that touches UI
import { readFileSync, existsSync } from "fs";
function preflight() {
const checks: { name: string; pass: boolean; detail: string }[] = [];
// Check 1: Design tokens exist
const tokensExist = existsSync("src/styles/tokens.css");
checks.push({
name: "Design tokens file",
pass: tokensExist,
detail: tokensExist ? "src/styles/tokens.css found" : "MISSING — create tokens first",
});
// Check 2: design-system.ts exists
const dsExist = existsSync("src/lib/design-system.ts");
checks.push({
name: "Design system constants",
pass: dsExist,
detail: dsExist ? "src/lib/design-system.ts found" : "MISSING — create DS constants",
});
// Check 3: CLAUDE.md has design section
if (existsSync("CLAUDE.md")) {
const claude = readFileSync("CLAUDE.md", "utf-8");
const hasDesign = claude.includes("Design System") || claude.includes("design-system");
checks.push({
name: "CLAUDE.md design section",
pass: hasDesign,
detail: hasDesign ? "Design rules found in CLAUDE.md" : "MISSING — add design rules to CLAUDE.md",
});
}
// Check 4: Visual regression baselines exist
const baselinesExist = existsSync("tests/visual-regression.spec.ts-snapshots");
checks.push({
name: "VRT baselines",
pass: baselinesExist,
detail: baselinesExist ? "Baselines captured" : "Run: npx playwright test --update-snapshots",
});
// Report
console.log("\n=== Design Preflight Check ===\n");
for (const c of checks) {
console.log(` ${c.pass ? "✓" : "✗"} ${c.name}: ${c.detail}`);
}
const allPass = checks.every((c) => c.pass);
console.log(`\n${allPass ? "All checks passed." : "⚠ Fix failing checks before UI work."}\n`);
return allPass;
}
preflight();Build the Design Audit Script (Retrospective Sweep)
Create a script that audits the entire codebase for design drift — hardcoded hex values, non-standard fonts, incorrect max-widths, missing breadcrumbs, and other violations. Run this after each major build session or as part of your weekly maintenance. This is the retrospective layer that catches anything the CI gates missed.
// scripts/lint-design.ts
import { readFileSync, readdirSync, statSync } from "fs";
import { join } from "path";
interface Violation {
file: string;
line: number;
rule: string;
value: string;
}
const violations: Violation[] = [];
const HEX_REGEX = /#[0-9a-fA-F]{3,8}/g;
const BANNED_FONTS = /["']?(Inter|Georgia|DM Serif|Times|serif)["']?/gi;
const BANNED_WIDTHS = /max-width:\s*(\d+)px/g;
const VALID_WIDTHS = [1200, 860];
function scanFile(filePath: string) {
if (!filePath.match(/\.(tsx?|css)$/)) return;
if (filePath.includes("node_modules")) return;
const content = readFileSync(filePath, "utf-8");
const lines = content.split("\n");
lines.forEach((line, i) => {
// Rule 1: No hardcoded hex
const hexMatches = line.match(HEX_REGEX);
if (hexMatches) {
for (const hex of hexMatches) {
// Skip CSS variable declarations in globals.css
if (filePath.endsWith("globals.css") && line.includes("--")) continue;
violations.push({ file: filePath, line: i + 1, rule: "no-hardcoded-hex", value: hex });
}
}
// Rule 2: No banned fonts
const fontMatches = line.match(BANNED_FONTS);
if (fontMatches) {
for (const font of fontMatches) {
violations.push({ file: filePath, line: i + 1, rule: "no-banned-font", value: font });
}
}
// Rule 3: Only valid max-widths
let widthMatch;
while ((widthMatch = BANNED_WIDTHS.exec(line)) !== null) {
const width = parseInt(widthMatch[1]);
if (!VALID_WIDTHS.includes(width)) {
violations.push({ file: filePath, line: i + 1, rule: "invalid-max-width", value: `${width}px` });
}
}
});
}
function scanDir(dir: string) {
for (const entry of readdirSync(dir)) {
const full = join(dir, entry);
if (statSync(full).isDirectory()) {
if (!["node_modules", ".next", ".git", "dist"].includes(entry)) {
scanDir(full);
}
} else {
scanFile(full);
}
}
}
// Run scan
scanDir("src");
scanDir("app");
// Report
if (violations.length === 0) {
console.log("\n✓ No design violations found.\n");
} else {
console.log(`\n✗ ${violations.length} design violations found:\n`);
for (const v of violations) {
console.log(` ${v.file}:${v.line} — [${v.rule}] ${v.value}`);
}
process.exit(1);
}Establish the Drift Detection Workflow
Create a lightweight daily/weekly process that catches drift before it compounds. The key insight: budget 20-30% of capacity for design debt remediation (Pixelmojo framework). Treat design drift like bugs — track it, triage it, fix it. The workflow: (1) Run lint-design.ts at session end, (2) Run visual regression after every PR, (3) Review visual diffs weekly, (4) Update baselines only when design changes are intentional.
// Add to your CLAUDE.md or session protocol:
//
// ## Session-End Design Checklist
// 1. Run: npx tsx scripts/lint-design.ts
// 2. Run: npx playwright test tests/visual-regression.spec.ts
// 3. If violations found → fix before committing
// 4. If visual diffs found → verify they are intentional
// 5. Update baselines ONLY for intentional changes:
// npx playwright test --update-snapshots
// Claude Code Hook (runs automatically on UI file changes):
// .claude/hooks/post-edit.sh
#!/bin/bash
CHANGED_FILES=$(git diff --name-only HEAD)
if echo "$CHANGED_FILES" | grep -qE '\.(tsx|css)$'; then
echo "UI files changed — running design lint..."
npx tsx scripts/lint-design.ts
fi
// Weekly design health metric:
// scripts/design-health.ts
import { execSync } from "child_process";
const violations = execSync("npx tsx scripts/lint-design.ts 2>&1 || true", { encoding: "utf-8" });
const violationCount = (violations.match(/design violations/i) || []).length > 0
? parseInt(violations.match(/(\d+) design violations/)?.[1] || "0")
: 0;
const vrtResult = execSync("npx playwright test tests/visual-regression.spec.ts 2>&1 || true", { encoding: "utf-8" });
const vrtFailed = vrtResult.includes("failed");
console.log("=== Design Health Report ===");
console.log(` Lint violations: ${violationCount}`);
console.log(` Visual regression: ${vrtFailed ? "DRIFT DETECTED" : "Clean"}`);
console.log(` Health: ${violationCount === 0 && !vrtFailed ? "HEALTHY" : "NEEDS ATTENTION"}`);Agent ignores CLAUDE.md design rules and generates inconsistent UI
CLAUDE.md is context, not enforcement. Add the CI gate (Step 3) — builds fail on violations regardless of agent behavior. The linter catches what the context misses. monday.com's key insight: 'The difference is whether the code already conforms to the design system or whether that work is left to the developer and the review process.'
Visual regression has too many false positives (dynamic content, animations, dates)
Use maxDiffPixelRatio (1-2%) for tolerance. Mask dynamic regions with Playwright's mask option. For heavy dynamic content, switch to Applitools Eyes which uses semantic AI to recognize dynamic elements. Percy's AI agent automatically classifies 40% of diffs as false positives.
Multiple agents working in parallel create conflicting design changes
Visual regression baselines are stored in Git — merge conflicts surface design conflicts. Each agent branch gets compared against main's baselines. Use feature flags to gate large visual changes. The CI design gate (Step 3) runs on every PR, catching conflicts before merge.
Existing codebase has hundreds of violations — lint-design.ts overwhelms
Use 'progressive adoption': add violations to an allowlist initially, then resolve them in batches. Each sprint, reduce the allowlist by 20%. Track the metric weekly (Step 6). Never add new violations — only reduce the existing count.
Further reading
Related Concepts
Browse full Lexicon →