Every time a new model drops, the same conversation starts: “Have you tried the new GPT? It’s way better than Claude for this.” Engineers spend a day benchmarking, maybe switch their default, and feel like they’ve done something. They haven’t.

The model stopped being your primary constraint a while ago. Most teams just haven’t noticed, because switching models feels like progress.

Why the model is the wrong lever

Coding is roughly 25-35% of the software development lifecycle. Review, requirements, debugging, coordination, and meetings consume the rest. Amdahl’s Law is unsparing: even if AI made coding infinitely fast, you’d get at most a 25% system-level improvement. The rest of the system stays exactly as slow.

That’s why 92.6% AI tool adoption across 121,000 developers is converging on roughly 10% organisational productivity gains. The ceiling is structural, not a model quality problem.

So when an engineer switches from one model to another and gets a marginal lift on code generation speed, they’ve moved a bottleneck that wasn’t limiting them in the first place.

Where the real constraints are

Start with direction: is the team building the right thing? A fast model producing a polished implementation of the wrong feature is pure waste. Agoda’s engineering team found in March 2026 that project-level velocity gains were “surprisingly modest” despite individual output going up, because the upstream work of figuring out what to build was never the thing AI was touching.

Then there’s specification. Even the right feature, built to a vague spec, gets built wrong. The model can only operate on what you give it. A two-paragraph brief produces a two-paragraph implementation. The engineer writing a precise specification and verifying results against intent — rather than inspecting implementation line by line — is doing the actual work that moves things forward.

Then review. Faros AI found that high-adoption teams merged 98% more pull requests. PR review time went up 91%. Bugs per developer went up 9%. Faster generation without faster review creates a traffic jam at the merge gate. That’s not a model problem. It’s a process problem.

The coordination layer nobody talks about

Model-benchmarking is a textbook displacement activity. It’s concrete, it has a clear outcome, and it feels like improving your engineering practice. You’re not. You’re optimising for the constraint that isn’t binding.

The more diagnostic question is whether your team is working from shared context — shared prompts, shared skills, shared review checklists — or whether a few engineers have quietly built powerful local workflows and everyone else is still flying blind. Uneven AI adoption inside a team creates its own coordination failures. The engineer with a polished setup and the one who hasn’t touched their defaults since last year are not operating on the same feedback loop. That gap matters more than the gap between models.

AI can help with requirements, specifications, and review just as much as it can help with code. Most teams just aren’t using it there yet. The teams winning right now aren’t the ones who found the best model — they’re the ones who applied AI to the constraints that were actually binding, and distributed that practice across the whole team.

What to fix instead

Pick a feature that slipped last quarter. Trace every day it was delayed. Was the delay in writing code? Almost certainly not. It was in figuring out what to build, getting alignment, waiting for review, or discovering mid-build that the spec was wrong.

Fix that. Distribute the practice. The model you’re already using is good enough. What it needs from you is a better brief, a faster merge loop, and a team that’s using it consistently — not a newer training run.

If your team is chasing model benchmarks while the real bottlenecks sit untouched, talk to us.


Sources