Security Archives - Modular Technology Group

When the Token Bill Comes Due: What Uber and Microsoft Just Taught the Rest of Us About Renting Intelligence

Cale Hollingsworth — Tue, 02 Jun 2026 16:51:30 +0000

In late May, Goldman Sachs put a number on something a lot of operators have been feeling in their gut for months. Agentic AI, the bank projects, could push token demand up by more than 24 times in the next few years. Read that again. Not 24 percent. Twenty-four times.

If your AI runs on someone else’s meter, that forecast is not a growth story. It’s an invoice you haven’t opened yet.

And the companies you’d expect to absorb that hit better than anyone are the ones flinching first. Uber reportedly burned through its entire 2026 AI budget in a matter of months. Uber’s CTO went public with it, and the operations chief, Andrew Macdonald, told Business Insider something even more damning than the overspend: after talking to his senior engineers, he couldn’t find a clear line between how many tokens the company was burning and how many features customers actually got. More than 80 percent of Uber’s engineers were using agentic tools. Over 60 percent of the code was AI-generated. And it still wasn’t worth what they were paying for it.

Microsoft, meanwhile, started pulling its own developers off a third-party coding assistant and moving them onto an in-house tool, with a deadline that landed conveniently at the close of its fiscal year. The official line was consolidation. The timing told a different story. Microsoft also flipped one of its developer products to token-based billing because the cost of running it had ballooned.

When the two companies that helped write the playbook for aggressive AI adoption are both quietly restructuring how they buy it, that’s not a blip. That’s the meter catching up with the marketing.

The math nobody put on the slide

For two years, the pitch for cloud AI has been simple: usage is cheap, it’ll only get cheaper, and you can scale infinitely. The first part was true for a chatbot answering one question at a time. It stops being true the moment you point an agent at a real workflow.

A single agentic task can consume more than a thousand times the tokens of a one-shot chatbot query. Agents don’t ask once. They plan, call tools, check their own work, retry, and chain steps together, and every one of those steps is metered. Multiply that by a whole department running agents all day, then layer on the Goldman Sachs 24x demand curve, and the “it’ll get cheaper” story collapses under its own arithmetic.

The numbers coming out of the industry have started to sound less like efficiency and more like a dare. Nvidia’s CEO said earlier this year that if one of his $500,000 engineers wasn’t burning at least $250,000 in tokens, he’d be worried. Airbnb’s CEO bragged that 60 percent of the company’s code is now AI-generated. One reported that 84 percent of its code was AI-written. A three-person team running an aggressive stack of agents managed to spend $1.3 million in tokens in a single month.

Somewhere in there, the conversation quietly stopped being about results and started being about consumption for its own sake. Token usage became the brag, as if the size of the bill proved the value of the work. Uber just demonstrated, in public, that it doesn’t.

Consumption is not a strategy

Here’s the part that should land for any business owner watching this from the outside: the meter punishes exactly the usage you’re being told to chase.

You are encouraged to put AI into everything, hand the agents more autonomy, let them run longer and reason harder. Every one of those instructions increases token consumption. So the more seriously you take the advice, the faster your costs compound, and they compound on a curve you don’t control and can’t predict. You find out what you owe after the work is done. That’s a brutal way to run a budget, and it’s an impossible way to run a small or mid-sized organization that needs to know its number before the quarter starts, not after.

This is the question we ask clients to sit with before they sign anything: what happens to our costs if our usage succeeds? If the honest answer is “they go up in a way we can’t forecast,” then the platform isn’t priced for you to win. It’s priced for you to ration.

And rationing is precisely what’s happening. The biggest names in tech are now teaching their people to use less of the very tools they spent a year telling everyone to use more of. If Microsoft and Uber can’t make consumption-based AI pencil out at their scale, the odds that a 40-person law firm or a boutique advisory shop will are not good.

The hardware cavalry isn’t coming in time

The usual reassurance is that better chips will rescue the economics. Next-generation inference hardware is genuinely more efficient, and the Goldman Sachs report leans on exactly that hope: cheaper tokens, usage keeps climbing, profits eventually follow.

The timing doesn’t cooperate. The newest platforms are still rolling out, and the efficiency gains, real as they are, are years from deploying at the scale this demand curve requires. In the meantime, more than half of the data center projects planned around the current generation of hardware have reportedly been delayed or cancelled, choked by shortages of power and parts. The hyperscalers themselves have started stretching their hardware to run for six years instead of replacing it on the old cadence, which is hard to square with the promise of a dramatic efficiency leap every single year.

So the demand is exploding now. The relief is theoretical and late. And the gap between the two gets paid for in your monthly bill.

There is another way to buy this

None of this is an argument against AI. We build our business on AI. It’s an argument against renting your intelligence by the drink from infrastructure you don’t control, priced on a model designed to climb.

At Modular Technology Group, we made a different bet, and the news this month is the reason we made it. Modular runs private AI on infrastructure we own, in a US-based FedRAMP data center, at a fixed monthly price. No per-token billing. No per-query billing. No consumption meter quietly compounding in the background while your agents do exactly what you asked them to do.

When the AI runs on hardware you control, the equation flips. Heavier usage doesn’t mean a heavier invoice. Once the box is yours, running more agents, longer reasoning, bigger context, all of it lives inside a cost you already know. The incentive inverts: instead of being penalized for using AI more, you’re free to. That’s the difference between intelligence as a metered utility and intelligence as owned capability.

A few things follow from owning the stack instead of renting it:

Your costs are knowable before the work starts. A flat monthly fee means the budget conversation happens once, up front, not in a panicked review when the usage report comes in. No surprises, no variable cloud bill, no quarter blown in a month.

Your usage can succeed without punishing you. The whole point of AI is to do more with it over time. On a metered model, success is the thing that breaks your budget. On owned infrastructure, success is just success.

You’re not locked to one vendor’s pricing whims. Microsoft just moved a product to token billing because its own costs ran away. When you don’t own the layer your business depends on, someone else’s cost problem becomes your pricing problem overnight. We run the model that fits the job, on hardware that’s ours, so a vendor’s repricing isn’t your emergency.

Your data stays yours. This was always the foundation. Models run locally, on our infrastructure, in our facility. Your data never routes through someone else’s cloud to get answered. Your data, your rules. The cost predictability is a benefit that rides on top of the same architecture that keeps your information private in the first place.

The meter was always the business model

The token-billing crisis isn’t a bug in cloud AI. It’s the business model working as designed. Usage was always going to climb, agents were always going to multiply the consumption, and the bill was always going to follow the curve. May just happened to be the month some very large companies looked up and noticed.

The organizations that come out of this ahead won’t be the ones who used AI the least to survive the bill. They’ll be the ones who stopped renting intelligence by the token and started owning it, so that using more was never the thing that hurt them.

If you’re staring at an AI bill that grows every time the tools actually work, that’s worth a conversation. We’re always happy to compare notes on what fixed-cost, private AI looks like for an organization your size. You can reach us at modtechgroup.com/consultation.

Because when the token bill finally comes due across the industry, you want to be the company that already knows its number.

Modular Technology Group builds and operates private AI infrastructure on owned, US-based hardware: fixed pricing, local inference, your data and your AI under your rules, from dirt to desktop. modtechgroup.com

The post When the Token Bill Comes Due: What Uber and Microsoft Just Taught the Rest of Us About Renting Intelligence appeared first on Modular Technology Group.

When Google Validates Your Architecture: Private AI Was Never the Alternative

Arthur — Mon, 27 Apr 2026 18:06:32 +0000

At Google Cloud Next 2026 in Las Vegas this week, Google made a quiet but significant announcement: Gemini can now run on a single air-gapped server, fully disconnected from the internet — and from Google itself.

The product is a Dell-certified, Google-approved hardware appliance delivered through a neocloud partner called Cirrascale Cloud Services. Eight Nvidia GPUs. Confidential computing protections. The marketing hook: “pull the plug and the model vanishes.”

We’ve been watching the coverage with genuine interest. And a fair bit of déjà vu.

The Market Just Caught Up

For years, enterprise organizations in financial services, healthcare, defense, and government faced what analysts called an impossible tradeoff: access the most powerful AI models through public cloud APIs — and surrender control of your data — or settle for less capable open-source models you could host yourself.

Google’s announcement is a formal acknowledgment that this framing was always wrong. The demand for fully private AI wasn’t a niche concern. It was the only architecturally honest answer for any organization that takes data governance seriously.

Modular Technology Group has been building on that premise since before it was a keynote slide.

What Google Is Actually Selling

Let’s be precise about the offering, because the details matter.

The Cirrascale deployment requires a Google-certified hardware platform. It requires a partnership with a specific neocloud provider. It requires Google’s approval of the appliance configuration. General availability is projected for June or July 2026 — it’s in preview now.

And the selling point — that the model “vanishes when you pull the plug” — is a confidential computing feature that ties the model weights to the specific hardware. Impressive engineering. But consider what it implies: you are still dependent on Google’s certification ecosystem to acquire and maintain access to the model. The sovereignty is physical, not architectural.

The right question for any enterprise evaluating this: What is your exit strategy?

What happens if Cirrascale changes its pricing or partnership terms?
What happens if Google deprecates the on-premises licensing tier?
What happens when the certified hardware goes end-of-life?

Vendor lock-in doesn’t disappear because the server is in your rack. It moves from the network layer to the hardware and licensing layer.

A Different Architectural Bet

Modular Technology Group made a different set of choices when we designed our private AI infrastructure.

Model-agnostic. We are not tied to any single model provider. Our clients run the models that fit their use case — whether that’s an open-weight model, a fine-tuned variant, or a frontier model accessed under controlled conditions. When a better model ships, you switch. No re-certification. No new appliance.

Hardware-agnostic. We operate in a FedRAMP-authorized data center on infrastructure you control. You are not locked to a specific GPU configuration or a vendor-approved hardware stack. The architecture scales with your needs, not with a product roadmap you don’t control.

Fixed, transparent pricing. No usage-based API billing. No surprise invoices at the end of the month. You know what you’re paying. That predictability is a feature, not an accident.

Available now. Not in preview. Not GA in Q3. Running, deployed, with clients in production today.

Data Sovereignty Is Architecture, Not Proximity

The broader lesson from Google’s announcement isn’t about Google. It’s about how the enterprise AI market is maturing in its understanding of what “private” actually means.

Physical proximity — a server in your building, or in a data center you can point to — is necessary but not sufficient. True data sovereignty requires architectural ownership: control over the model, the infrastructure, the data pipeline, and the exit path.

When your AI model “vanishes when you pull the plug,” ask yourself: whose plug is it, really?

At Modular Technology Group, “Your Data, Your Rules” isn’t a product announcement. It’s been the design constraint from the beginning.

If you’re evaluating private AI infrastructure — whether in response to this week’s news or because you’ve been thinking about it longer than Google has been announcing it — we’re happy to compare architectures.

Schedule a conversation →

Source inspiration: LinkedIn

The post When Google Validates Your Architecture: Private AI Was Never the Alternative appeared first on Modular Technology Group.

The Market Is Moving to Local AI. Here’s Why Modular Bet on It Early.

Cale Hollingsworth — Mon, 01 Dec 2025 15:24:26 +0000

The last few years have been a reminder of a simple truth: every time we hand our data to a SaaS platform, we inherit their entire security posture – every vendor, every subcontractor, every analytics tool buried three layers deep. The latest OpenAI metadata leak is just another example of a structural problem, not an anomaly. Cloud AI depends on trust the cloud can’t realistically guarantee.

This isn’t about fear, hype, or “AI doom.” It’s about math, physics, and risk.

Running AI in a centralized cloud is expensive, unpredictable, and increasingly exposed. Every prompt, every document, every customer interaction becomes part of a massive telemetry pipeline you don’t control. As vendors bolt on more analytics, more monitoring, more subcontractors, the attack surface expands quietly in the background.

That’s the opposite of what businesses with sensitive data actually need.

Across legal, healthcare, finance, engineering, and public-sector teams, we’re seeing the same pivot:
“We want AI, but we want it inside our walls, under our rules, and on infrastructure we control.”

This is exactly why Modular was built.

We run AI the way critical infrastructure should run:
• Local – compute lives on your hardware or inside our FedRAMP-grade facility.
• Private – prompts, embeddings, logs, and outputs never touch a public cloud.
• Open-Source- no proprietary surveillance, no forced upgrades, no mystery training loops.
• Predictable – your cost structure is hardware, not runaway API billing.
• Sovereign – data, inference, and model behavior are yours. Fully. Not rented.

Cloud AI will always have a place for large-scale training. That’s fine. But the real value, the day-to-day reasoning, drafting, summarizing, planning, discovery, research, and workflow integration, belongs close to the data. That’s where privacy is defensible and cost is manageable. It’s also where performance can be dramatically better.

Local AI isn’t a trend. It’s the next evolution of enterprise computing.
The same way servers moved out of mainframes, and storage moved out of proprietary appliances, AI is moving out of hyperscale clouds and back into customer-controlled environments.

At Modular, we’re building the stack for that future: local AI workspaces powered by open models, secure RAG pipelines, GPU-optimized inference, and complete data custody from end to end.

If your organization is evaluating how to bring AI into regulated or confidential workflows, the shift has already started. Local AI isn’t a fallback. It’s the architecture that will define the next decade of computing.

If you’re ready to explore what a private AI environment looks like for your team, we’re here to help you build it.

The post The Market Is Moving to Local AI. Here’s Why Modular Bet on It Early. appeared first on Modular Technology Group.