Most Teams Are at Level 2. They Think They're at Level 5.

Ninety percent of developers now use AI tools at work. Rework rates are climbing. Those two facts belong in the same sentence.

The 2025 DORA report found that increased AI usage correlates with greater software delivery instability — more rework, more failed deployments. Teams also report productivity gains. Both things are true. The explanation is not that AI is bad. It is that most teams are running it badly, at a level of adoption that produces output without producing understanding.

The spectrum nobody maps honestly

Steve Yegge published an eight-level framework for AI-assisted development. Worth reading in full, but the short version: Levels 1–4 are IDE-centric, reactive, human-supervised. The engineer writes a prompt, reviews the output, applies the diff. Level 5 is the first real break — the engineer moves to the CLI, drops the IDE as their primary workspace, and starts running agents without babysitting them. Levels 6–8 are parallel agent coordination, building toward custom orchestration.

Most engineering teams think they are at Level 4 or 5. They have Copilot or Cursor installed. They use Claude in chat. They feel faster. They are probably at Level 2 or 3.

The gap is not about which tools are installed. It is about trust, workflow, and what you do when the agent is wrong. Level 2 means copying output carefully, reviewing every line, staying cautious. Level 5 means you have spent enough time with agents to know what to delegate, how to verify results without reading every diff, and how to structure tasks so the agent can actually finish them. That takes months of deliberate practice. Not a week of Cursor.

What the DORA data actually shows

The 2025 DORA report dropped its old Low-Medium-High-Elite performance tiers and replaced them with seven team archetypes: Harmonious High-Achiever, Stable and Methodical, High Impact Low Cadence, Pragmatic Performer, Constrained by Process, Legacy Bottleneck, and Foundational Challenges. The change matters because the old classification hid bad combinations. A team could be Elite on throughput and a disaster on sustainability, and the label still said Elite.

The metric making this visible is rework rate — the share of deployments triggered by production defects rather than planned work. Lee Campbell’s analysis of the 2025 data lands on the uncomfortable version: a team can have a low change failure rate while spending 40% of its time fixing things that were almost right. High throughput with a climbing rework rate means shipping technical debt faster, not delivering value faster.

That is where the trust numbers get strange. Thirty percent of developers say they do not trust the code they are currently shipping. Ninety percent are using AI tools. Teams at Level 2 or 3 — reviewing output without deeply understanding it, feeding vague specs to the agent, running no systematic verification — get more output and less confidence in that output. The rework rate follows.

What Mitchell Hashimoto figured out the hard way

Hashimoto’s account of his own AI adoption is worth reading because he is technically elite and still describes a multi-month journey through distinct stages. He started by doing his own work twice — once manually, once with the agent — specifically to learn where the agent went wrong. He built verification steps. He ran end-of-day agents on triage work and absorbed the results the next morning instead of context-switching mid-day. He documented failure modes before delegating whole classes of tasks.

Where he ended up: an agent running consistently covers roughly 10–20% of a normal working day. Not 80%. Not full automation. Twenty percent of continuous, well-scoped, well-verified delegation, built up through deliberate experimentation over months.

Most teams skip every step between “install Cursor” and “have an agent running.” They get the output volume of a higher level without the judgment that makes it safe. The rework finds them later.

The team-level problem is different from the individual one

Individual productivity is the wrong unit. DORA measures teams, not developers, because that is where software actually ships.

The adoption gap inside a single team is often more damaging than the gap between teams. When one engineer operates at Level 5 and another at Level 2, they are not on the same feedback loop. The Level 5 engineer generates output faster than the Level 2 engineer can review it. PR review time in high-AI-adoption teams increased 91% by one measure. Bugs per developer went up 9% in the same period. Incidents per PR in some cohorts climbed sharply. The bottleneck moved from code generation to review, and teams did not move the reviewer with it.

Campbell’s compass model makes the diagnostic concrete. The question is not “what is our score?” It is “what is our shape?” A High Impact Low Cadence team — shipping big, infrequent, high-stakes releases — needs a different fix than a Legacy Bottleneck team buried in maintenance. AI adoption advice that ignores archetype is just generic advice. It will not touch your actual constraint.

What to do if you are leading an engineering team

Find out where your team actually is. Not where they think they are. Ask how they verify agent output. Ask what happens when the agent produces something wrong. Ask whether anyone has documented the failure modes and shared them. The answers tell you the real level.

Do not measure AI adoption by tool access or self-reported satisfaction. Measure rework rate. Measure PR cycle time. Measure whether bugs per deploy are moving. These separate genuine capability shift from faster-output-plus-faster-mistakes.

The path from Level 2 to Level 5 is not a training course. It is structured experimentation with real stakes, systematic documentation of what the agent gets wrong, and deliberate workflow redesign — not just bolting a chat window onto the existing process.

The teams pulling ahead are not the ones who adopted AI first. They are the ones who know which archetype they are, measured honestly, and redesigned their delivery system around that shape.

If you want to know which archetype your team is and what the rework rate is actually telling you, talk to us.

Sources