I let Playwright agents loose on a site with 300+ broken themes

I used Claude’s Playwright agents to audit and fix readability issues across 300+ color themes that were imported directly from shell color files.

The hard part was not the work itself, but figuring out how to ask the agent the right question so it would approach the problem correctly. Once the prompting was tuned, the planner and generator agents produced an initial set of tests that validated text readability across every theme. From there, I treated the playwright-test-healer agent like a worker process: point it at failures, let it iterate, watch what it patched.

Once it was aimed correctly, the healer ran for about an hour and actually surfaced and fixed every contrast and readability issue. The fixes were not perfect, but they were workable.

All of this was running on this site (itsthatguy.com) that I had originally built with Claude Code as a stress test for how far I could push it with frontend prompting. The codebase itself is a mess. Letting an agent generate an entire frontend without guardrails produces an impressive amount of chaos. It took two 12-hour days to get the site into a usable state, mostly spent correcting logic bugs and cleaning up design output.

I attached a screenshot of the test directories to show what the agents produced on their own. In hindsight, I would absolutely include instructions about directory structure before letting the agents loose, because they will happily dump files in the root of your project like feral raccoons.