·7 min read
What Is an AI QA Architect? Role, System, and Skills
An AI QA Architect designs the test systems AI agents run on. Learn the role, architecture, skills, and MCP testing boundary.

Published: · 3 min read
An honest method for benchmarking LLM speed: tokens per second vs time to first token, across 42 Ollama Cloud models, measured every 10 minutes.
On this page
I test software for a living. So when a vendor calls an AI model "fast," I don't trust the word. I measure it.
Most leaderboards rank how smart a model is. Almost none rank how fast it answers. You pick a model because it scored well, ship it, and then your users sit and wait.
Speed is two different numbers. People mix them up constantly.
Time to first token (TTFT). The wait before the first word appears. You feel this every time a chatbot "thinks" before replying.
Tokens per second (TPS). How fast the model writes once it starts. A token is a chunk of a word.
A model can be great at one and terrible at the other. You need both.
I run an independent tracker called ollamatps.com. It benchmarks 42 Ollama Cloud models. Here is the exact method, because a benchmark you cannot inspect is just a claim.
max_tokens is capped at 300. It never changes between runs.eval_count / (total_duration - time_to_first_token). The startup wait is removed, so TPS measures pure writing speed.Same prompt, same cap, same schedule. That is what makes two models comparable.
Building this was a testing job, not a coding job. Retries on failure. A reliability score per model. A circuit breaker for models that keep failing. If you cannot trust the measurement, the number is noise. That part is the same work I do on any test system.
Bigger is not faster.
The fastest model on the board is one of the smallest: a 30B model at over 200 tokens per second. The model literally named "ultra" is dead last, under 8 tokens per second.
And the wait varies wildly. TTFT ranges from about 0.3 seconds to 23 seconds across the 42 models. Same cloud. Roughly 80x difference in how long you wait for the first word.
If you picked your model on a benchmark score alone, you have no idea which of these you are getting.
The first version tracked fewer models and was less robust. I rebuilt the engine this month (v2) to be multi-provider and to test continuously. The live board updates every 10 minutes.
Watch it run: ollamatps.com
Anton Gulin is the AI QA Architect, the first person to claim this title on LinkedIn. He builds AI-powered test automation systems where AI agents and human engineers collaborate on quality. Former Apple SDET (Apple.com / Apple Card pre-release testing). Find him at anton.qa or on LinkedIn.
Get notified when I publish something new, and unsubscribe at any time.
·7 min read
An AI QA Architect designs the test systems AI agents run on. Learn the role, architecture, skills, and MCP testing boundary.

·2 min read
Native drag-and-drop in Playwright MCP cuts brittle mouse steps. Learn what changed, when to use it, and how QA teams should validate complex flows.

·4 min read
Playwright MCP v0.0.73 fixes a critical gap where extension channels and executable paths couldn't be resolved from CI/CD environment variables. This release enables true containerized test pipelines—configure browser installations once in your Dockerfile, override per job via environment variables, no hardcoded paths required.
