Toolchain Performance Matters More Than Ever in the AI Era

AI coding is increasingly an agentic loop: write code, run validation, read failures, and fix again. In that loop, test, lint, and build performance directly shape engineering throughput.

At D2, while listening to Evan You’s talk in person, one slide stayed with me.

It was not a flashy conference slide. It was more like a clean way to name something many of us had already started to feel: once AI accelerates the act of writing code, the slowest part of the development loop moves somewhere else.

It shows three stages.

Before AI, most of the time was spent writing code. Waiting for tools was only a small block near the bottom. Humans were already slow enough that slow tooling was annoying, but not always the most visible constraint.

After AI, the writing-code block gets compressed. Code appears much faster, and waiting for tools suddenly takes a much larger share of the loop.

Then comes AI + better tools: faster basic error feedback, faster behavior validation, and faster iteration.

A D2 talk photo showing how the proportion between writing code and waiting for tooling changes before and after AI

That photo is not only about the old annoyance of developers waiting for commands. It points at a newer engineering fact: in the AI era, toolchain performance directly affects the speed of AI-assisted development loops.

Chinese version of this article

The Loop Is Waiting

When we used to say tooling was slow, we usually meant that humans had to wait.

Run tests, get water. Run a build, check messages. Wait for lint, switch to another window. It was unpleasant, but it was still just one empty patch inside a human workflow.

Tools like Codex, Claude Code, and Cursor have changed the shape of development.

They do not only generate a piece of code and stop. A more typical process is: read context, edit code, run tests or lint, read the failure, edit again, run again. The workflow is becoming an agentic loop.

In that loop, the toolchain is not a side command sitting next to a human developer. It is the input stream that lets the agent decide what to do next. Tests, lint, and builds are becoming its sensory system.

Slow test feedback makes the agent slower.

Slow lint feedback makes the agent slower.

Slow build feedback makes the agent slower.

So the question is no longer whether a human has the patience to wait ten minutes. The question is: what is the feedback latency of one automated development loop? How many validation and correction cycles can an agent complete in the same amount of time?

Before, slow tooling meant humans were waiting.

Now, slow tooling means the whole development loop is waiting.

After AI-assisted coding, validation becomes the visible feedback bottleneck

Why It Was Less Painful Before

Human coding used to be slow by default.

A feature could take half an hour, two hours, or a full day. During that time, the developer was reading context, checking interfaces, thinking through edge cases, making careful edits, and occasionally running tests. Slow tooling was still irritating, but it was surrounded by a lot of human work.

The slowness of the toolchain was hidden by the slowness of humans.

That does not mean toolchain performance was unimportant before. It means it was less likely to become the primary constraint.

AI changes that ratio.

When a change can be generated quickly, the key question is no longer only how fast code appears. It becomes how fast the code can be validated. If AI writes a patch in 30 seconds and the test suite takes ten minutes, the loop is not governed by AI. It is governed by tests.

It is like exposing the slowest belt in a factory line.

When the worker at the front was slow, a slow conveyor belt later in the line was less noticeable. Replace the worker with a fast robotic arm, and the belt speed becomes the factory speed.

Why Unified Toolchains Matter More Now

When Evan You talks about Oxc, Rolldown, Vite, and Vitest, I do not read it as merely “new tools are faster.”

The deeper point is that JavaScript tooling has been highly fragmented. That fragmentation was tolerable at human coding speed. Inside an agentic loop, it turns into system cost.

Parsers, transformers, test runners, linters, formatters, and bundlers often have their own implementations. They parse the same code separately, transform it separately, configure behavior separately, and sometimes disagree in subtle ways. A lot of time is spent re-processing the same program rather than validating product behavior.

In the AI era, that waste gets amplified.

An agent does not treat repeated validation as a nuisance in the same way humans do. While fixing a problem, it may naturally run tests multiple times, run lint multiple times, and perform focused checks repeatedly. That is exactly what we want if the feedback is fast.

If the toolchain is fast, frequent validation improves code quality.

If the toolchain is slow, frequent validation stalls the loop.

That is why tools such as Oxc, Rolldown, Vite, and Vitest matter beyond benchmark screenshots. They shorten the feedback loop that AI-assisted programming depends on.

This is not only a frontend issue.

My concrete migration happened in a NestJS backend project. The framework and runtime are different, but the loop has the same shape: change code, validate behavior, read feedback, and continue. A slow validation chain limits the whole system.

One Backend Data Point

This argument needs at least one concrete number.

One medium-sized NestJS backend project had accumulated around 200 test files.

Before the migration, unit tests ran on Jest. Jest is mature, stable, and well documented. Many NestJS projects started there for good reasons. But as the suite grew, the feedback time became painful.

One full unit-test run went beyond 580 seconds without finishing.

That distinction matters: it did not finish in 580 seconds. It was still running after more than 580 seconds, so that number is only a lower bound for the old flow.

After moving to Vitest + SWC, the same batch completed:

Test Files  200 passed (200)
Tests       1299 passed (1299)
Duration    34.73s

Even if the old run is treated only as a lower bound, that is at least a 16.7x difference. The real gap was probably larger because the old run never reached completion.

The project later gained a few more tests. A current rerun looked like this:

Test Files  206 passed (206)
Tests       1314 passed (1314)
Duration    37.27s
shell total 38.345s

This matters a lot for an agentic loop. It is not the scorecard of a tool migration. It is what a shortened feedback loop looks like.

When a full unit-test suite takes around half a minute, an agent can treat it as a normal validation step. When it has already exceeded ten minutes and is still running, every correction cycle is forced to stop in place.

The unit test suite moved from more than 580 seconds and still unfinished to a completed 34.73-second run

Why I Would Start With Tests

If only one part can be optimized first, I would look at tests before lint.

Not because lint is unimportant, but because tests carry denser feedback inside an agentic loop.

Lint tells you whether formatting, rules, and some static constraints are fine. Tests get closer to behavior: whether a service branch broke, whether a controller response changed, whether a guard boundary still holds, whether provider contracts still work.

AI coding needs more than “this looks syntactically acceptable.” It needs to know whether the change broke existing behavior.

When tests are slow, the AI repair loop becomes blunt. It may still be able to fix the problem, but each judgment takes too long. Fewer cycles means less room for self-correction.

That is where Vitest + SWC helped.

The point was not changing a command name from jest to vitest. The point was turning full validation from a decision that needed hesitation into an action that could happen frequently inside the loop.

Oxlint follows the same direction.

After moving lint to Oxlint, daily pnpm lint runs are generally in the seconds range, and sometimes below a second. I do not want to attach a precise multiplier because I did not keep a rigorous ESLint baseline. The reliable statement is simpler: lint moved from “something that requires a wait” to “something you can run casually.”

In AI-assisted development, that difference is large.

A privacy-safe NestJS backend feedback loop: test, lint, build, and CI validation

Fast Means More Cycles

Toolchain performance is often described as developer experience.

That is true. Faster commands feel better. No one argues with that.

But in the AI era, experience is not the whole story.

Faster tooling changes the development strategy.

If tests take 30 seconds, you are more likely to let the agent run the full suite after a meaningful change. If tests take ten minutes, you are more likely to run only related files, or postpone validation. If lint finishes in less than a second, it becomes a natural step after edits. If lint takes a long time, it gets delayed.

Slow tooling makes correct behavior expensive.

Fast tooling makes correct behavior cheap again.

Cheap matters to agentic loops. Agents are not afraid of repetition. They are constrained by feedback latency. The faster the feedback, the more they can try, fix, and converge. The slower the feedback, the more they get stuck in single-shot edits.

That is how I now think about toolchain performance.

It is not polish. It is a throughput limit for AI programming.

This Is Not Tool-Chasing

Moving to Vitest, Oxlint, or eventually looking at Oxc and Rolldown should not be about chasing whatever is new.

That kind of tool-chasing easily becomes churn.

The real reason to migrate is whether the tool shortens the feedback loop, reduces repeated work, and makes the agentic loop more stable.

This is also why a clean cut can sometimes be better.

Keeping Jest and Vitest side by side may look safer in the short term, but it can create long-term cognitive overhead. Mock APIs, globals, configuration, coverage behavior, watch mode, and type declarations may all differ slightly. AI-generated test changes are also more likely to mix styles when both systems remain present.

Half-migrated infrastructure is expensive.

It leaves judgment cost for every future change. Humans have to decide. AI has to decide. More decisions mean more latency and more mistakes.

A clean cut does not mean replacing things blindly. Mocks, fake timers, coverage, e2e, and integration boundaries still need to be checked. But once those checks are done, a unified toolchain is itself a benefit.

The Practical Conclusion

Toolchain performance matters more than ever in the AI era.

Not because everyone suddenly became obsessed with performance, and not because new tool names sound exciting. The reason is simpler: once AI dramatically accelerates code generation, validation becomes the new primary constraint.

More precisely, it becomes the primary constraint of the agentic loop.

AI writes code, runs validation, reads failures, and edits again. The speed of that loop depends on the speed of feedback. Tests, lint, and build are no longer just command-line chores.

The Jest to Vitest + SWC migration is only one data point: a unit-test flow that had gone beyond 580 seconds without finishing became a 34.73-second completed run across 200 test files and 1,299 test cases. What changed was the loop speed.

The toolchain should not be the red light in AI programming.

It should be a short, clear feedback loop: change a little, verify, fix immediately if needed, and continue.

Writing code is only the beginning.

Being able to quickly prove that the code still works is the foundation of engineering efficiency in the AI era.