AI is starting to look a lot like the early days of cloud and the real race is operational
Date:
Tue, 30 Jun 2026 08:56:02 +0000
Description:
AIs model race is fading. The real battle is running systems reliably at scale.
FULL STORY ======================================================================Copy link Facebook X Whatsapp Reddit Pinterest Flipboard Threads Email Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Subscribe to our newsletter Over the past two years, most
of the noise around AI has focused on the model race whose model is bigger, faster or scoring better on benchmarks.
But as AI moves from pilots into the core of products and workflows, a familiar pattern from the early days of cloud is reemerging: systems are more programmable than ever, but they are also much harder to run. Yadi Narayana Social Links Navigation
Field CTO, Asia-Pacific & Japan, Datadog. And that means we now know where
the most important competition in AI is shifting: from who has the best model to who can operate AI reliably, efficiently, and safely at scale. Latest Videos From Watch full video here: AI is now hitting operational limits, not model limits When looking at realworld telemetry from thousands of production systems, a clear picture starts to form. Nearly 1 in 20 AI requests fails
once applications reach scale, and a majority of those failures now stem from capacity limits such as rate limits, quotas and concurrency caps, rather than from model bugs or poor accuracy. That is a very different story from the benchmark charts most teams used to obsess over.
The amount of data sent per request is also climbing. Across many production estates, median users have more than doubled their token usage, while heavy users have seen volumes grow severalfold. That growth is both a symptom of more ambitious AI use cases and a direct driver of cost and IT infrastructure stress. You may like How AI observability helps organizations move from experimentation to production The AI availability gap is real, and it has nothing to do with the model Why building AI applications still means
building infrastructure-first
You can see the impact most clearly in what many teams now describe as GPU sprawl: fragmented fleets spread across clouds and onprem clusters. Some GPUs sit idle while others are consistently saturated, and there is very little correlation between where GPU hours are spent and where they create business value.
The result is familiar to anyone who lived through the early adoption of
cloud computing runaway spend, unpredictable performance and capacity crises that appear out of nowhere. Are you a pro? Subscribe to our newsletter Sign
up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed! Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over. How this is playing
out in APAC Across AsiaPacific, and especially in ASEAN, were currently
seeing structural pressures: AI adoption is accelerating, but operational maturity is uneven.
Singapore is further along on governance and observability, driven in part by regulatory expectations and a more mature cloud landscape. Meanwhile, markets such as Indonesia, Malaysia and Thailand are moving very fast on deployment, often pushing AI into customerfacing services while operational practices catch up.
As organizations across these markets roll out multimodel and agentbased architectures, they are running into reliability issues, limited visibility and inconsistent model performance. Token usage is increasing quickly, but optimization practices, such as prompt caching and context engineering, are underutilized. What to read next Stop thinking of AI data centers as compute systems The AI infrastructure boom is bigger than GPUs AI agents in live operations require new standards and management
That gap between readiness and deployment is already creating operational and cost debt that will be harder to unwind later. The four operational disciplines AI teams need With the evolution of AI resembling the early days of cloud, the good news is that we can predict, at least a little, where things are headed.
Now, the question AI leaders should be asking is this: which disciplines distinguish the teams that will cope best with this complexity?
In my view, there are four that teams working with AI need to adopt to see sustainable success: 1. Establish visibility and attribution You cannot operate what you cannot see, and AI is no exception.
Teams need to see how GPU hours and tokens map to specific applications,
teams and use cases, so they can connect that usage to latency, error rates and user impact.
That makes it possible to separate businesscritical workloads from background noise, and provide clarity into which services are driving cost or consuming capacity.
When usage is visible and attributable on a single view, decisions about
where to optimize, protect capacity or dial back become much less emotional and much more datadriven. 2. Enforce control and guardrails Without guardrails, AI systems will consume as much capacity as you give them.
Practical controls include rate limits and budget caps, along with safeguards on agent behavior to stop unbounded retries, loops and poorly bounded workflows from exhausting shared resources.
These controls are about making consumption predictable and ensuring that one runaway experiment cannot impact core production services.
Without this discipline, AI programs tend to hit economic limits long before they hit technical ones. You end up with impressive prototypes, but unsustainable unit economics. 3. Optimize GPU utilization before scaling supply Most teams reach for more GPUs when what they really have is a utilization problem.
GPU instances already account for a significant share of compute costs, and that proportion only grows as organizations push deeper into training and inference at scale.
But idle or underutilized GPUs create the sense of a shortage even when there is headroom in the estate. In turn, many teams can see their overall GPU bill climbing, but cannot see which workloads are driving consumption, or pinpoint the steps needed to improve efficiency.
What we learned during the early days of cloud is that in these instances, overprovisioning becomes the safest default but then spend balloons even
when there is stranded capacity in the fleet.
Treating GPU infrastructure as a firstclass system means tracking utilization so that teams can distinguish genuine capacity shortages from misallocation
or fragmentation. Then, they can decide whether to free up capacity or truly add more supply. 4. Design for efficiency at the application layer High AI costs and rates of failure come from how applications are put together, not from the models themselves.
Inefficient patterns, poor routing across providers and unoptimized prompts all drive up token usage and increase the risk of timeouts, errors and inconsistent behavior.
But with proper visibility into prompts, agents and tools in production,
teams can see how requests actually flow through the system and tune for quality, latency and cost in a controlled way.
That turns the application layer from a black box into a place where
efficient engineering choices are deliberate, measurable and aligned with business outcomes. What leaders should do in the new AI race The early days
of cloud taught us that programmability without operational discipline can be as much a liability as an advantage. AI is now at a similar inflection point: the winners will not just be those with access to the most powerful models, but those who treat AI as a longterm engineering and operations capability.
A useful test for any organization is whether it can explain where AI spend goes, how agents behave in production and which workloads it would protect first if capacity were suddenly cut.
If the honest answer is I dont know yet, then the next phase of the AI
journey is clear: stop chasing the next model release, and focus on building the operational foundations that will help you scale AI safely and sustainably. We've reviewed and ranked the best business cloud storage services . This article was produced as part of TechRadar Pro Perspectives , our channel to feature the best and brightest minds in the technology
industry today.
The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here:
https://www.techradar.com/pro/perspectives-how-to-submit
======================================================================
Link to news story:
https://www.techradar.com/pro/ai-is-starting-to-look-a-lot-like-the-early-days -of-cloud-and-the-real-race-is-operational
--- Mystic BBS v1.12 A49 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)