Playwright CLI v0.1.10 Brings Spec-Driven Testing Skills for AI Agents
Published: ·5 min read
Playwright CLI v0.1.10 introduces a spec-driven testing skill that guides AI agents through plan/generate/heal workflows for maintaining test suites from written specifications. Network inspection now uses stable request indexing, and raw output is default for all data-fetching commands—eliminating preprocessing steps in CI pipelines.
Playwright CLI v0.1.10 introduces a spec-driven testing skill that guides AI agents through plan/generate/heal workflows for maintaining test suites from written specifications. Network inspection now uses stable request indexing, and raw output is default for all data-fetching commands—eliminating preprocessing steps in CI pipelines.
The release
Playwright CLI v0.1.10 shipped on April 30th with two headline features that reshape how AI agents interact with browser automation. The network inspection subsystem got a complete overhaul—network is gone, replaced by requests plus granular subcommands that output indexed, pipe-friendly data. But the feature I'm actually excited about is the spec-driven testing skill: a references/spec-driven-testing.md reference that teaches AI agents to drive Playwright tests from written specifications.
This matters because I spend my days building test infrastructure that scales. At CooperVision, I oversaw a 300% increase in test count while hitting 50% faster deployments. That didn't happen by writing more tests manually—it happened by building workflows that let the system do the repetitive work. The spec-driven testing skill is exactly that kind of workflow accelerator.
Why this matters for engineers and QA architects
If you're running AI-augmented QA workflows, you know the problem: agents can generate test code, but without a structured pattern for keeping that code alive, regressions pile up and the suite rots. The spec-driven testing skill provides that structure.
The references/spec-driven-testing.md file outlines a plan/generate/heal cycle. An agent reads a written spec, plans which Playwright assertions map to the spec's behavior, generates the corresponding test code, then heals regressions when the spec changes or the app drifts. This isn't hypothetical—I've seen this pattern work at scale when migrating from Selenium to Playwright, where we achieved 40% faster test execution.
For CI engineers, the network inspection overhaul matters more immediately. The old network command inlined bodies and required brittle string parsing. The new numbered commands (requests, request <num>, request-headers <num>, request-body <num>) output stable indexes and pipe-friendly data. You can pipe directly to jq without stripping wrapper text.
How to use it
Here's the spec-driven testing workflow in practice. The skill lives at references/spec-driven-testing.md and gets loaded automatically when you're running Playwright CLI with agent-mode enabled:
# Start Playwright CLI with agent mode npx playwright-cli --agent
# The spec-driven testing skill is loaded from references/
# Agent can now read a spec and generate tests following the pattern:
# 1. PLAN: Map spec requirements to Playwright selectors/assertions
# 2. GENERATE: Write test code
# 3. HEAL: Detect and fix regressions when specs/app change
For the network inspection overhaul, here's a working example:
# List all requests with stable numbered indexes
playwright-cli requests
# Get full details for request #5
playwright-cli request 5
# Extract headers and pipe to jq for CI processing
playwright-cli request-headers 5 | jq '.set-cookie'
# Save response body directly to file
playwright-cli response-body 5 --filename ./debug-response.json
# Run a test that uses the extracted data
playwright-cli test --config ./playwright.config.ts
The raw output change is significant: data-fetching commands like cookie-list, localstorage-list, and route-list now emit unwrapped output by default. Your CI scripts drop 1-2 preprocessing steps per invocation.
The gotcha nobody is talking about
The spec-driven testing skill is read-only in this release. You get the reference file and the pattern, but there's no built-in mechanism for the agent to automatically detect spec changes and trigger heals. You have to wire that yourself. In my experience, the first implementation is always manual—you define the trigger conditions, the diff logic, the heal policy. The skill gives you the pattern; you build the automation.
This is honest: don't expect autonomous test maintenance out of the box. Plan for 2-4 weeks of integration work to wire the heal step into your CI pipeline, depending on your test suite size and the stability of your application contract.
What this changes in your CI pipeline
Three concrete changes:
Network data is now scriptable. You can pull request headers, bodies, and responses directly into CI steps without parsing wrapper text. I estimate this saves 15-30 minutes of scripting work per team per quarter.
Spec-driven test maintenance becomes possible. If you're running AI agents for QA, the pattern exists now. You still need to implement the trigger/heal logic, but the framework is there.
MCP server stability improves. Unhandled promise rejections from user callbacks no longer crash the transport. If you're running Playwright CLI as a long-running MCP server (common in AI-augmented workflows), this is meaningful reliability.
Migration notes
The network command is replaced. If you have scripts that parse network output, they need updating:
Data-fetching commands (cookie-list, route-list, etc.) now emit raw output. If you were stripping the ### Result wrapper, remove that logic—it's gone.
Config-relative path resolution for initPage and initScript now resolves against the config directory (matching Vite/Vitest/ESLint behavior). If you were working around silent load failures, those workarounds are no longer needed.
Verdict
Playwright CLI v0.1.10 is a meaningful release for AI QA architects and CI engineers alike. The spec-driven testing skill is the headline for teams adopting AI-augmented workflows—it provides the pattern, even if the automation is still DIY. The network inspection overhaul is the practical win for everyone else: stable indexing, pipe-friendly output, raw by default.
My recommendation: upgrade, wire the MCP server stability fixes first (low risk, immediate benefit), then evaluate the network inspection changes for your CI scripts. The spec-driven skill is a longer-term investment—budget the integration time honestly, and you'll get the payoff.
If you're already running Playwright in your CI pipeline, this release makes it easier to script and maintain. If you're building AI-augmented QA workflows, the skill gives you a structure to teach your agents. Both audiences win.
Anton Gulin is an AI QA Architect — the first person to claim this title on LinkedIn. He builds AI-powered test automation systems where AI agents and human engineers collaborate on quality. Former Apple SDET, current Lead Software Engineer in Test at CooperVision. Find him atanton.qaor onLinkedIn.
Playwright MCP v0.0.73: How to Configure Browser Paths via Environment Variables
Playwright MCP v0.0.73 fixes a critical gap where extension channels and executable paths couldn't be resolved from CI/CD environment variables. This release enables true containerized test pipelines—configure browser installations once in your Dockerfile, override per job via environment variables, no hardcoded paths required.
browser_run_code_unsafe: The Naming Change That Forces Honest Architecture
Playwright MCP v0.0.72 renamed browser_run_code to browser_run_code_unsafe, making sandbox escape risk explicit. The new browser_network_request tool enables indexed request inspection. Teams using this MCP server in CI pipelines need to update tool references and security documentation now.
AI agents made it worse. They still get stuck at login screens. This month, that changed.
MCP (how AI talks to tools) is changing how we test. Two teams at Microsoft shipped a fix. They released two ways to share your browser with AI.
Playwright core added browser.bind().
The CLI team added an MCP Bridge.
Both land in the same month. This is not a test. It is the new way.