AI Test Automation Architecture: The 3-Layer System

Published: · 4 min read

AI Test Automation Architecture: The 3-Layer System AI test automation architecture is the system that tells AI what to test. It also defines how to run tests and prove the result. I split it into three layers: orchestration, execution, and evidence. Without all three, AI testing becomes prompt output with no production gate.

AI Test Automation Architecture: The 3-Layer System  AI test automation architecture is the system that tells AI what to test.  It also defines how to run tests and prove the result.  I split it into three layers: orchestration, execution, and evidence.  Without all three, AI testing becomes prompt output with no production gate.

AI test automation architecture is the system that tells AI what to test.

It also defines how to run tests and prove the result.

I split it into three layers: orchestration, execution, and evidence.

Without all three, AI testing becomes prompt output with no production gate.

Why tool lists fail

Most AI testing content starts with tools.

That is backwards.

AI means software that predicts.

Predictions can help QA teams move faster.

But predictions do not prove quality.

A tool can generate a test.

It cannot decide release risk alone.

It cannot prove the browser state was clean.

It cannot explain why a failure matters.

That work belongs to architecture.

The 3-layer model

I use three layers for AI test automation architecture.

Layer Plain meaning Main question
Orchestration test control plan What risk should this cover?
Execution actual test run Did it run in the real pipeline?
Evidence proof from runs Can a human review it?

If one layer is missing, the system gets weak.

If evidence is missing, the team gets false confidence.

Layer 1: Orchestration

Orchestration means test control plan.

This layer defines the work before AI writes anything.

It answers five questions:

  1. What user flow matters?
  2. What risk does this test cover?
  3. What data must exist first?
  4. What browser state is allowed?
  5. What failure should block release?

AI can help draft the first version.

But a human still owns the risk call.

That is the difference between generation and architecture.

Layer 2: Execution

Execution means actual test run.

This layer proves the test can survive the real path.

That path is usually CI.

CI means automated build server.

A local demo is useful.

It is not enough.

Run the test where code ships.

Check browser state, cleanup, retries, test data, and worker isolation.

This is where Playwright and MCP matter.

Playwright is a browser test tool.

MCP means tool connection standard.

Together, they let AI agents use a live browser.

But the run still needs stable launch control.

That is why playwright-mcp v0.0.75 matters.

It serialized shared browser launch in isolated mode.

That means parallel runs get ordered startup.

Small release note.

Real architecture impact.

Layer 3: Evidence

Evidence means proof from runs.

This is the layer most teams skip.

Every AI-created test should leave receipts.

Useful receipts include:

  • trace
  • screenshot
  • log
  • video when timing matters
  • saved browser state when auth matters

The point is simple.

A reviewer should inspect the run without rerunning it.

If that is impossible, the test is not ready.

AI can write code quickly.

Review still needs proof.

A practical gate

Here is the gate I use before AI-generated tests ship.

Gate Pass condition
Scope The test maps to one named risk
Data Test data setup is explicit
State Browser state is controlled
Run The test passes in CI
Evidence Trace or equivalent proof exists
Review A human can explain the failure mode

This is not heavy process.

It is a small guardrail.

It stops weak tests from becoming permanent debt.

What this changes for QA teams

The goal is not to slow AI down.

The goal is to make AI work reviewable.

When the architecture is clear, AI becomes useful in three places:

  1. It drafts coverage ideas.
  2. It writes first-pass test code.
  3. It explains failures from evidence.

But humans still own the system.

Humans define risk.

Humans review evidence.

Humans decide what blocks release.

The rule

Never ask AI to expand test coverage first.

Build the proof system before that.

Generation is cheap.

Evidence is the architecture.

That is the line between AI testing demos and production QA.


Anton Gulin is the AI QA Architect — the first person to claim this title on LinkedIn. He builds AI-powered test automation systems where AI agents and human engineers collaborate on quality. Former Apple SDET (Apple.com / Apple Card pre-release testing). Find him at anton.qa or on LinkedIn.

ai-testing · playwright · mcp · test-automation · qa-architecture

Subscribe

Get notified when I publish something new, and unsubscribe at any time.

Related articles

Read all my blog posts