Human in the Loop Is a Smell

Every manual approval in your pipeline is a confession. You’re admitting that your process cannot be trusted to self-correct.

That’s fine for about five minutes. Then it becomes architecture debt.

The thing people get wrong

The default assumption is that human review equals safety. More eyes, fewer mistakes. It feels true. It isn’t.

When a step in your workflow requires human sign-off, you’ve introduced a bottleneck, a false sense of security, and an incentive to batch work. None of those are safety.

The data makes this uncomfortable. LinearB’s 2026 Software Engineering Benchmarks analyzed 8.1 million pull requests from 4,800 engineering teams. AI-generated PRs wait 4.6 times longer for review than human-written ones. Agentic AI PRs wait 5.3 times longer. Not because they’re worse quality. Because reviewers dread them and procrastinate. The review isn’t improving the code. It’s just slow.

Change Advisory Boards tell the same story at the deployment layer. Teams defer releases until the scheduled CAB meeting, then present a batch of completed work to people who weren’t involved in building any of it. Bryan Finster put it plainly: “You cannot ‘inspect in’ lower risk just as you cannot ‘inspect in’ quality.” The CAB approves or rejects based on paperwork and gut feel. That’s theater.

Three failure modes, all avoidable

Scale kills the first one fast. A fraud model evaluates millions of transactions per hour. A recommendation engine influences billions of interactions per day. No human can track that. When automated systems break at machine speed — flash crashes, runaway ad spend — the damage is done before any human in any loop sees it. The loop closes after the fact, not during it.

The second: expensive review creates batching. If sign-off is hard to get, teams save up work. Bigger batches mean higher blast radius when something goes wrong. The approval process designed to reduce risk manufactures the conditions for larger failures.

The third is the one nobody talks about. If your reviewer says yes almost every time, they are not actually reviewing. The n8n production AI playbook flags this directly: a consistent 99% approval rate is a signal to remove the step, not celebrate it. High rejection rates are also a signal, but a different one. Fix the process generating the output, not the review step.

What fixing this actually looks like

The shift is from inspecting outputs to designing the system that makes inspection mostly unnecessary.

Martin Fowler’s team calls this the difference between “humans in the loop” and “humans on the loop”. In the loop means you’re a gatekeeper on every decision. On the loop means you built the harness: specifications, quality gates, test coverage, rollback triggers. Gating every decision becomes a thing you no longer need to do.

Most review cycles aren’t catching irreversible errors. They’re catching things your CI should catch. Automate those. Save manual review for decisions that are genuinely irreversible or genuinely outside what your pipeline can validate.

CABs exist because teams lack certified delivery pipelines, not because human judgment is essential for each release. The answer isn’t a better CAB; it’s a pipeline that self-certifies. DORA’s 2025 research found only 16.2% of organizations deploy on demand. Most of the other 83.8% are held back by approval gates that solve a problem the team stopped having years ago.

The review cycles you haven’t mapped yet

PR approvals and deploy gates are obvious. A few others tend to stay invisible until someone draws the value stream.

Agent run babysitting is the new one. If someone is watching an AI agent session to catch drift, that’s a monitoring problem. Agents should run in bounded loops with automated checkpoints: git commits as rollback points, test suites as acceptance gates. The human reviews what came out, not what’s happening inside.

On-call triaging is an older version of the same problem. If part of your on-call rotation is manually routing alerts that should be routing themselves, you’ve built process debt and called it operational maturity.

Design and copy approval chains are worth a look too. The three-person sign-off on a landing page headline is almost always theater. The only approval that matters is whether it ships and whether users respond.

One question worth asking

For every recurring review step in your process: if this step disappeared tomorrow, what would actually break?

“Not much” is a safety theater problem. “We’d ship broken things because our tests don’t catch X” is a testing problem, which is solvable. Keeping the manual review step is not solving it.

The loop should close itself. If it can’t yet, that’s the real engineering work.

If your team is caught between velocity and oversight and not sure where the actual risk sits, talk to us.

Sources: