Your AI productivity data is lying to you

Every study showing AI makes developers faster is methodologically broken, and METR just admitted it. In February 2026, the organization running the most rigorous AI productivity research posted an explanation of why their experiment design no longer works: the developers who depend most on AI tools refuse to participate in studies that require working without them, even at $50/hour. The people with the biggest productivity gains have opted out of the data.

Your engineering org is making decisions based on numbers that were never valid. GitHub says developers accept 30% of Copilot suggestions. That is a usage metric, not a productivity metric. The oft-cited 55% faster claim from GitHub’s own study measured how quickly developers completed a single isolated function in an unfamiliar language. Not how your team works.

The measurement problem runs deeper than bad studies

METR’s original study found experienced developers were 19% slower with AI tools. Those same developers estimated they were 20% faster. The gap is not a measurement quirk. AI feels fast. It generates code instantly. The slowdown comes later, in review, in debugging, in figuring out what the generated code is actually doing.

Faros AI tracked telemetry from 10,000 developers across 1,255 teams. High-AI teams completed 21% more individual tasks and merged 98% more pull requests. PR review time went up 91%. Bug rate per developer went up 9%. At the company level — deployment frequency, lead time, DORA metrics — nothing moved. Individual velocity up. Organizational throughput flat.

Writing code got cheaper. Everything downstream got harder.

95% adoption, zero reliable signal

The Pragmatic Engineer’s March 2026 survey found 95% of respondents use AI tools weekly, and 75% use AI for at least half their engineering work. This is not niche behavior. It is the default.

Near-universal adoption means the control group is gone. You cannot run an A/B test when 95% of your sample is in the treatment condition. Productivity data from before AI tools belongs to a world with different habits, different codebases, different work patterns. It does not transfer cleanly.

So you are left with developer sentiment (overwhelmingly positive) and org-level metrics (flat). Both can be true at once. Coding with AI is genuinely more enjoyable. Shipping software is not obviously faster. Developers know which of those things they feel in the moment.

The question nobody is asking

Most engineering leaders ask: “Is AI making us faster?” That question is close to unanswerable right now. The studies are compromised, the metrics are noisy, and the developers who would report no benefit have already left the control group.

The useful question is: “Where is our actual bottleneck?” If your code review queue sits at two weeks, buying everyone Claude Code does nothing for throughput. If the slowdown is getting alignment on what to build, an agentic coding tool solves the wrong problem. If the issue is deploy confidence — flaky tests, manual release gates, incident anxiety — more code generation just produces more code to be anxious about.

AI coding tools are a force multiplier on whatever process your team already runs. If the process has problems, you get more problems, faster. The Faros AI data shows this clearly: bigger PRs, more bugs, same shipping cadence.

What to fix instead

Stop trying to measure whether AI is making your developers faster. You will not get a clean answer, and chasing one distracts from the actual work.

Track where work stalls: time in review, time from merge to deploy, time from deploy to detecting a regression. If review is the bottleneck, tools that speed up code generation make it worse. If you want AI to help, look at AI-assisted review, not AI-assisted generation. The teams getting real leverage from these tools fixed their delivery pipeline first, then used AI to accelerate something that already worked.

Shipping speed is a systems problem, not a typing speed problem. Adding a faster input to a slow system just fills the queue faster.

If your team is trying to figure out where AI actually fits in your delivery system, talk to us.

Sources: METR — We are Changing our Developer Productivity Experiment Design (Feb 2026) · Faros AI — The AI Productivity Paradox Research Report · The Pragmatic Engineer — AI Tooling for Software Engineers in 2026 (Mar 2026) · GitHub — Research: quantifying GitHub Copilot’s impact on developer productivity and happiness