The Meter Is Running: How AI Learned to Think Cheaply
Site Owner
发布于 2026-05-12
When OpenAI launched o1, chain-of-thought reasoning felt like a premium product — slow, expensive, reserved for hard problems. Twelve months later, o3-mini is cheaper than GPT-4o on hard tasks, small reasoning models are entering the market at 1/10th the cost, and thinking budgets are becoming a first-class product feature. This is how AI's most expensive feature became a commodity — and what happens next to the companies and engineers who built around it.
The Meter Is Running: How AI Learned to Think Cheaply
In 2024, a senior engineer at a mid-sized tech company ran an experiment. She gave her team two options for code review: GPT-4o for instant, stateless responses, or o1-preview for slower, chain-of-thought reasoning that actually understood context. The team chose 4o — because o1-preview cost twelve times more per conversation.
Twelve months later, that same engineer runs the same choice again. This time she hesitates. Because o3-mini is cheaper than GPT-4o, and on hard problems it wins. The math has flipped.
This is the story of how the most expensive feature in AI — thinking — became a product. And what happens next is going to reshape every AI company, every AI product, and maybe every job that involves judgment.
The Original Sin of Reasoning Models
When OpenAI released o1 in September 2024, it felt like a new species. Chain-of-thought reasoning. Test-time compute. The model that "thinks before it answers." For coding competitions, math olympiads, and multi-step logic problems, it was genuinely better in ways that felt qualitatively different.
But it came with a price tag that made it a luxury item. o1-preview cost roughly $60 per million output tokens. For context: a typical AI conversation might use 2,000 output tokens. A hard reasoning session — the kind where you're asking the model to think through a genuinely complex architecture problem — could burn 10,000 to 30,000 tokens of thinking. That's $0.06 to $1.80 per session, versus $0.02 for a GPT-4o session of equivalent length.
#AI Agent#AI模型#Agent#OpenAI
The Meter Is Running: How AI Reasoning Became a Product | New Universe Blog
For casual chat, the math was absurd. For a coding agent running 500 tasks a day, it was a budget line.
So o1 became the premium tier. The "hard problems only" tier. The tier you switch to when GPT-4o gives up and you need something smarter. Most AI usage — the casual queries, the quick lookups, the standard drafts — never touched it.
That pricing structure reflected the underlying economics: reasoning was genuinely expensive to provide. More compute at inference time. Longer outputs. More expensive to serve at scale. Providers passed that cost to customers, and customers made rational choices about when to pay the premium.
The market had decided: thinking was a luxury good.
The Commoditization Curve
Here's what technology does to luxury goods.
In 1995, mobile phone calls cost $0.50 per minute. Calling across the country was a special-occasion event. Every business traveler knew the per-minute rate and kept calls short. Today, unlimited talk is bundled into plans that cost $30/month and nobody thinks about it. The per-minute cost of a voice call dropped by roughly 1,000x in a decade, and the product went from luxury to utility.
The same curve has hit disk storage, bandwidth, compute per FLOP, genome sequencing, and solar panels. Every time it hits, the pattern is identical: a period of high prices and careful rationing, followed by a period of aggressive cost reduction, followed by an explosion of new use cases that weren't economically viable at the old price.
AI reasoning is hitting that curve right now.
The first inflection point was o3-mini's pricing — OpenAI quietly positioned it below GPT-4o on cost per token while beating it on hard tasks. That was the signal. Not just "reasoning is getting cheaper" but "reasoning is getting cheaper faster than quality is improving, which means the premium is collapsing."
The second signal is subtler: thinking budgets. Instead of unlimited chain-of-thought, providers are now metering how much thinking a model can do per request. This sounds like a limitation — and it is, initially. But it's also how commoditization works. When you meter something, you're admitting it's a product. When it's a product, engineers optimize for it. When engineers optimize for it, costs fall.
The third signal is in the models themselves. Small reasoning models — o3-mini, Gemini 2.0 Flash Thinking, Qwen's distilled variants — are entering the market at sizes and price points that would have seemed impossible for their capability level a year ago. The 13B "thinking" model that costs $0.10 per million tokens and handles 80% of what o1 did is not a research projection. It's available today.
The Two-Track Mind
What makes this commoditization interesting — and strange — is that reasoning isn't a single thing. It's at least two different cognitive modes that behave very differently under cost pressure.
Systematic reasoning is the kind o1 popularized: slow, step-by-step, backtracking when it hits dead ends, checking its own work. This is expensive by design. Every additional thinking step costs money. And for hard problems — genuine multi-step logic, novel code architecture, proof writing — there's no substitute. You either think it through or you don't.
Intuitive reasoning is different. It's the flash of insight. The pattern match that skips steps. The experienced engineer who looks at a codebase and immediately knows where the bug is. This mode used to be the exclusive territory of large, expensive models. But small models are getting disturbingly good at it. Gemini 2.0 Flash, at 1B parameters, displays what researchers call "thinking fluency" — the appearance of having worked something through, without necessarily having done the full computation.
This creates a product puzzle. As small models get better at faking systematic reasoning, how do you verify you're getting the real thing? The answer, increasingly, is: you can't — until it fails. A small model that gives you a confident, well-structured answer that happens to be wrong is worse than no answer at all. It's confident wrong.
So the commoditization of thinking is also, necessarily, the proliferation of confident errors. This is the tradeoff the industry hasn't fully grappled with.
Who Wins When Thinking Is Cheap
The collapse of reasoning costs redistributes power in predictable ways — and some of the winners and losers are counterintuitive.
Winners: Application-layer companies. Anyone building a product where AI does hard cognitive work — legal research, scientific literature synthesis, complex data analysis, architecture planning — just got access to capabilities that were enterprise-only twelve months ago. The unit economics of "ask an AI to think about this for thirty seconds" now fit inside a $5/month subscription. The addressable market for "AI that actually thinks" expands dramatically.
Winners: Coding agents. The sweet spot for test-time compute is code. Code has unambiguous correct answers (it either passes tests or it doesn't), long dependency chains, and high human cost when wrong. As reasoning gets cheaper, the agent that can think harder before shipping code becomes dramatically more valuable. The ceiling on agent capability rises with the budget for thinking.
Losers: Large-model incumbents. If reasoning matters most for a narrow set of hard tasks, and small models can do 80% of those tasks at 10% of the cost, the premium for frontier-scale intelligence compresses. The moat of "our model is smarter" shrinks when "smart enough" gets cheap. This is already visible in API pricing trends — GPT-4o and Claude Sonnet have both seen dramatic price cuts in 2026 as small reasoning models close the gap.
Losers: Rote cognitive workers. Here's the uncomfortable one. If the value of a senior engineer's judgment was partly that they could think through hard problems, and AI can now think through 60% of those problems at negligible cost — the scarcity premium on human judgment compresses. Not to zero. But measurably. The engineers who win in this world are the ones whose judgment AI can't replicate, not the ones whose basic cognitive labor AI can.
The Budget Is the Strategy
Perhaps the strangest consequence of metered thinking is what it does to how we think about AI.
When thinking was unlimited, the AI's behavior was mostly about the model. You picked a model, you got its capabilities. The usage pattern was binary: use it or don't.
When thinking is metered, behavior becomes a budget problem. How much thinking does this problem deserve? What's the cost of wrong answers at different thinking budgets? When is it worth paying for o3 over a faster model? When is 30 seconds of thinking worth more than 3 seconds?
These questions are fundamentally economic, not technical. And they have no fixed answer — because the price of thinking is still falling. The thinking budget that makes sense at $60/million tokens doesn't make sense at $6/million tokens. And at $0.60/million tokens — which is where we're heading — the calculus changes again entirely.
The engineers who are already winning this transition are the ones treating thinking as a resource to be allocated strategically, not a binary on/off switch. They're building systems that route problems to appropriate thinking budgets. They're measuring the quality/cost tradeoff for different thinking budgets on their specific workload. They're not using the most capable model — they're using the most cost-effective model for each task.
That's a fundamentally different relationship with AI. It's less like "using a tool" and more like "managing a workforce." And like any workforce management problem, the bottleneck is increasingly not the workers — it's the judgment to deploy them well.
The Meter Keeps Running
The trajectory is clear. Reasoning — real, systematic, test-time compute reasoning — will follow the same commoditization curve as everything else in AI. The cost per thinking step will fall. The quality at each price point will rise. The use cases that were economically impossible at $60/million tokens become routine at $6/million tokens.
This is unambiguously good for the industry. The bottleneck for AI adoption has never been capability — it's been cost per useful output. Compressing that cost unlocks applications that don't exist today.
But it's also a transition that benefits the prepared more than the caught-off-guard. The engineers who understand thinking as a metered resource, who build systems that allocate cognitive effort strategically, who know when "good enough and cheap" beats "expensive and marginally better" — they're the ones who will build the durable products.
The rest will be along for the ride, wondering why their premium model feels less premium every quarter.