
·5 min read
How to Score Your AI Test Agents: Offline Evaluation with Trajectories (2026)
Learn how to score the tests an AI agent writes. Record the run as a trajectory, replay it offline, and grade it without live API calls.
Playwright tutorials, Selenium migration guides, and QA engineering patterns from 10+ years in the field. No fluff. Only what works.

·5 min read
Learn how to score the tests an AI agent writes. Record the run as a trajectory, replay it offline, and grade it without live API calls.

·3 min read
Learn how to generate clean test scripts using Playwright Codegen, and how to scale those drafts into a production-ready test architecture.

·7 min read
An AI QA Architect designs the test systems AI agents run on. Learn the role, architecture, skills, and MCP testing boundary.
·5 min read
Learn how to score the tests an AI agent writes. Record the run as a trajectory, replay it offline, and grade it without live API calls.

·3 min read
Learn how to generate clean test scripts using Playwright Codegen, and how to scale those drafts into a production-ready test architecture.

·7 min read
An AI QA Architect designs the test systems AI agents run on. Learn the role, architecture, skills, and MCP testing boundary.

·4 min read
Compare Playwright, Cypress, and Selenium in 2026. Pick the right browser test tool for AI-agent workflows.

·6 min read
Playwright v1.60 adds scoped HAR recording, locator.drop(), ARIA boxes, and test.abort() so CI failures carry better proof.

·4 min read
AI Test Automation Architecture: The 3-Layer System AI test automation architecture is the system that tells AI what to test. It also defines how to run tests and prove the result. I split it into three layers: orchestration, execution, and evidence. Without all three, AI testing becomes prompt output with no production gate.

·2 min read
Native drag-and-drop in Playwright MCP cuts brittle mouse steps. Learn what changed, when to use it, and how QA teams should validate complex flows.

·7 min read
Most teams install MCP servers and hope they work. Here is how to test, evaluate, and gate MCP servers before they break your CI pipeline.

·4 min read
Playwright MCP v0.0.73 fixes a critical gap where extension channels and executable paths couldn't be resolved from CI/CD environment variables. This release enables true containerized test pipelines—configure browser installations once in your Dockerfile, override per job via environment variables, no hardcoded paths required.

·5 min read
Playwright MCP v0.0.72 renamed browser_run_code to browser_run_code_unsafe, making sandbox escape risk explicit. The new browser_network_request tool enables indexed request inspection. Teams using this MCP server in CI pipelines need to update tool references and security documentation now.

·5 min read
Playwright CLI v0.1.10 introduces a spec-driven testing skill that guides AI agents through plan/generate/heal workflows for maintaining test suites from written specifications. Network inspection now uses stable request indexing, and raw output is default for all data-fetching commands—eliminating preprocessing steps in CI pipelines.

·3 min read
AI agents made it worse. They still get stuck at login screens. This month, that changed. MCP (how AI talks to tools) is changing how we test. Two teams at Microsoft shipped a fix. They released two ways to share your browser with AI. Playwright core added browser.bind(). The CLI team added an MCP Bridge. Both land in the same month. This is not a test. It is the new way.

·3 min read
Playwright MCP v0.0.71 ships browser_drop. It gives you native drag-and-drop from any MCP client. No more evaluate scripts. No more mouse.move chains. Grid reordering, file drop zones, text editor drags — all work the same way a real user does.

·10 min read
Playwright's new stabilization work echoes a flaky-test fix from enterprise QA. See the pattern, why it matters, and how teams can reduce noisy failures.

·4 min read
Why skills need testing, not just writing — and how to do it systematically.

·6 min read
Playwright v1.59.0 ships the Screencast API, letting AI agents produce verifiable video evidence of their work. Engineers can replay agent actions with chapter markers and action annotations—no manual test replay required. Setup is three lines: start the screencast, run your agent logic, stop and save. This is the observability layer agentic workflows have been missing.

·6 min read
Technical decisions and lessons learned from rewriting a Python CLI tool as an OpenCode plugin.

·7 min read
I Ate My Own Dog Food: How I Benchmarked AI Skills and Proved Eval-Driven Development Works I built a tool to test AI skills. Then I used it on my own project. The benchmarks shocked even me. Anton Gulin is an AI QA Architect — the first person to claim this title on LinkedIn. He builds AI-powered test automation systems where AI agents and human engineers collaborate on quality. Former Apple SDET, current Lead Software Engineer in Test at CooperVision. Find him at anton.qa or on LinkedIn.

·6 min read
A practical walkthrough for creating, testing, and installing production-grade OpenCode skills.

·2 min read
Master Page Object Model in Playwright & TypeScript (2026). Structure scalable tests, copy real-world architecture patterns, and speed up testing.

·4 min read
A practical, step-by-step guide to migrating your Selenium test suite to Playwright. Includes code comparison, common pitfalls, and a migration strategy that won't disrupt your team.

·5 min read
Flaky tests destroy team confidence and slow deployment. Here are 10 proven strategies to eliminate flaky Playwright tests — from someone who's fixed thousands of them.

·4 min read
Learn Playwright from scratch. This step-by-step tutorial will have you writing your first automated test in under 10 minutes — no prior automation experience required.

·4 min read
Which test automation framework should you choose in 2026? A side-by-side comparison of Playwright, Cypress, and Selenium based on real-world experience testing for Apple, Fortune 500 companies, and enterprise teams.

Get notified when I publish something new, and unsubscribe at any time.