59% of so-called agentic requests in production make exactly one service call. Not a chain of decisions. Not a multi-step plan. One call. That number is from Datadog’s State of AI Engineering 2026, which analysed millions of production LLM spans, and it should make every team that’s announced an “agentic AI strategy” pause and look at what they actually shipped.

Only 18% of agentic requests make three or more service calls. The rest are doing something with a simpler, cheaper, more honest name: calling a model, maybe calling a tool, and returning a result. That is not an agent. That is autocomplete with routing logic and a fancier deployment story.

Why the label matters

Calling something an agent changes what you invest in. It changes the frameworks you adopt, the infrastructure you provision, the complexity you absorb, and the failure modes you plan for. If what you built is a glorified API wrapper, those investments are wrong.

The agent framing also shapes what you blame when things break. Most teams assume production AI failures are reasoning errors — the model hallucinated, misunderstood the prompt, took a wrong path. The Datadog data says otherwise. In February 2026, 60% of production LLM failures were rate limit errors, not reasoning failures. By March 2026, rate limits still accounted for nearly a third of all LLM failures — roughly 8.4 million individual rate limit errors in a single month.

The model isn’t failing your agents. Your infrastructure is failing a job that doesn’t need agent abstractions to begin with.

The operational ceiling nobody talks about

The dominant narrative around AI reliability is capability-focused. The model isn’t smart enough yet, context windows aren’t long enough, multi-step reasoning is unreliable. All true. None of it is the primary failure mode in production today.

The actual failure mode is operational. You hit provider rate limits. You retry on failure. Retries create more load. Your agent framework, which wasn’t designed for backpressure, amplifies the problem. Then you file it as an agent reliability issue when it’s an infrastructure design issue that would have been obvious if you’d called the thing what it was.

The wrong abstraction compounds quietly

Agent frameworks doubled in adoption this past year, from 9% to 18% of organisations. That’s the metric Datadog leads with as evidence of the agentic shift. What the data behind it shows is that most of those frameworks are orchestrating single-step calls. You took on the complexity of an orchestration layer to manage something that didn’t need orchestrating.

You build on a framework because it ships fast and the demo looks good. Then you’re debugging why your LangGraph pipeline is rate-limiting at scale, why retries burn tokens, why state management across the “agent” is harder than expected. The answer is usually that you didn’t need LangGraph. You needed a well-structured API call and a proper retry budget.

The teams actually running genuine multi-step agents in production — the ones in that 18% making three or more service calls — have a different problem. Their failure modes are orchestration failures, state corruption, and cascading retries across services. That is a legitimately hard engineering problem. It is also not what most teams claiming agentic architectures are dealing with, because most of them are in the 59%.

Three things to check this week

Count your actual service calls per agentic request. Not what the architecture diagram says. Not what the framework is capable of. What production traces show. If the median is one, you don’t have an agent problem. You have a labelling problem.

Look at your failure breakdown before investing in reasoning improvements. If rate limits are driving the majority of your errors, reasoning improvements won’t help. Infrastructure fixes will: request budgeting, backpressure handling, provider-side rate limit strategies.

The teams doing well in production AI right now aren’t the ones with the most sophisticated agentic architectures. They’re the ones who named the actual problem, matched the solution to it, and didn’t build a distributed system when a well-designed function call would do.

If your team is calling things agents but the data looks more like expensive autocomplete, talk to us.


Sources