NU · neighbordoorsrecords over spin
Open in NU's Reading Room →

The Picks-and-Shovels Decade Just Got Real

Frontier models are a coin-flip race, agents still miss a third of their tasks, and the smart money is pouring into compute, power, and the one foundry the whole thing runs on.

<h2>The race at the top is a dead heat — and that's the tell</h2><p>Here's the thing nobody at the top of the leaderboard wants to say out loud in mid-2026: at the very frontier, nobody is winning. As of early this year, Anthropic's best model holds the lead by roughly <b>2.7%</b> — a margin so thin it might as well be a rounding error. And that lead has been a revolving door since early 2025: US labs grab it, Chinese labs take it back, then it flips again. The frontier isn't a finish line anymore. It's a knife-fight in a phone booth.</p><p>That matters more than any single model launch, because it quietly kills the most expensive fantasy in the industry — the idea that one company builds an unassailable moat at the model layer and prints money forever. When the best models are separated by less than three percentage points and the lead changes hands every quarter, the model itself is not where durable value lives. It's becoming a commodity — an extraordinary, world-changing commodity, but a commodity all the same. And commodities don't make their producers rich. The people who supply the commodity producers do.</p><p class="pull">When the best models trade the lead every quarter, the model is not the moat. The thing underneath it is.</p><p>Take Gemini 3.5 Flash, now generally available. The headline isn't that it's the smartest thing alive — it's that it delivers frontier-level intelligence at roughly four times the speed of comparable models. Read that again. The competitive edge being marketed isn't raw IQ anymore. It's throughput. It's cost-per-token. It's how much inference you can run per dollar and per watt. When a flagship launch leads with speed instead of smarts, the market is telling you what's actually scarce. That's not a model story. That's an <b>infrastructure</b> story wearing a model's clothes.</p><p class="figcap">📊 The frontier in 2026: a cluster of models within a few points of each other, the lead changing hands every quarter.</p><h2>Agents leapt forward — and still fail a third of the time</h2><p>The most genuinely exciting number of the year is also the most sobering. On OSWorld — a benchmark that drops AI agents into real computer tasks, the messy clicking-and-typing work humans actually do all day — success rates jumped from about <b>12% to roughly 66%</b>. In benchmark terms, that's a moonshot. A year ago these agents face-planted on nine out of ten real tasks. Now they finish two out of three.</p><p>But flip the number over. Two out of three means a third of the time, the agent still blows it. Would you let something with a 34% failure rate run your payroll unsupervised? File your taxes? Push code to production on a Friday afternoon? Of course not. We've gone from 'cute demo' to 'promising intern who needs a babysitter,' and that's a real leap — but it is not 'fire the staff' territory, and anyone selling it that way is selling you a story.</p><p class="pull">Two out of three is a miracle and a warning in the same number. The demo dazzles; the deployment still needs a human standing behind it.</p><p>This is exactly what Andrej Karpathy meant when he called this the 'decade of agents' — not the year, the decade. The capability curve is bending hard and fast, but the road from 66% to the 99.9% reliability that real enterprise workflows demand is long, expensive, and paved with edge cases. Every one of those last percentage points is harder won than the one before it, and every one gets paid for the same way: more compute, more training runs, more inference, more power. Here's the part the bears miss — agents getting better is not an argument against the infrastructure thesis. It is the engine driving it. A more capable agent doesn't consume less compute as it matures. It consumes more, because now you actually want to run it on everything.</p><p class="figcap">📊 OSWorld success rates roughly 12% to 66% in a year — a vertical leap that still leaves a third of tasks unfinished.</p><h2>NVIDIA stopped bragging and started shipping</h2><p>If you want a single signal that the industry has crossed from hype into deployment, watch what happened at NVIDIA's GTC. For years that stage was a benchmark beauty pageant — bigger numbers, faster chips, watch this graph go up and to the right. This year the tone shifted. The spotlight moved to real enterprise agentic deployments: agents running on factory floors in manufacturing, optimizing logistics networks, working inside finance operations.</p><p>That is a meaningful change in posture. You don't pivot your flagship conference from 'look how fast our chip is' to 'look at this manufacturing customer in production' unless the buyers in the room have stopped asking about benchmarks and started asking about ROI. The conversation moved from the lab to the loading dock. And here's the part that matters for where the money goes: every one of those agentic deployments — manufacturing, logistics, finance — is a standing order for compute. Not a one-time training bill you pay once and forget. A continuous, growing, every-single-day inference appetite that scales with usage and never switches off.</p><p>Meanwhile the picks-and-shovels reality showed up in the billing models too. As of June 1, 2026, GitHub Copilot moved to usage-based <b>'AI Credits'</b> billing, with per-user budgets and a premium 'Copilot Max' tier. That's the whole industry telling on itself. Flat-rate seats are giving way to metered consumption for one reason: the underlying cost — compute — is variable, real, and rising. A flat seat fee only works when your marginal cost per user is trivial. The moment you start metering by the sip, you're admitting the thing being sipped is genuinely scarce and genuinely expensive to produce.</p><p class="pull">When the software giants switch from flat seats to metered credits, that's not a pricing tweak. That's an admission of what's scarce underneath.</p> <h2>The capital is staggering — follow exactly where it lands</h2><p>Now the numbers that make your eyes water. OpenAI raised roughly <b>$122B at an ~$852B valuation</b>, with Amazon, Nvidia, SoftBank, and Microsoft on the cap table. Anthropic took an additional $40B from Google plus $5B from Amazon, on top of chip deals with Google and Broadcom reportedly worth hundreds of billions. These aren't venture rounds anymore. The names on the cap table are no longer financiers betting on a startup — they're the cloud and silicon giants buying a seat next to their own future demand. These are nation-state-scale capital commitments.</p><p>Step back and look at the whole field. US private AI investment hit roughly <b>$285.9B in 2025</b> — about 23 times China's ~$12.4B. That is one of the most lopsided capital concentrations in the history of any industry, anywhere, ever. Money on that scale doesn't move on vibes. It moves toward things it expects to own.</p><p>But the question that actually makes you money isn't 'how much.' It's 'where does it land?' Trace the dollars and they don't pool at the chatbot. They flow downhill — into chips, into data centers, and into the electricity to run them. The Broadcom and Google chip deals aren't software spend; they're silicon and capacity. OpenAI's mega-round isn't going to UI polish; it's going to compute and the buildings and substations to house it. The logic is brutally simple: when a model lead is worth 2.7% and lasts a single quarter, you don't spend $122B defending the model. You spend it owning the factory floor underneath every model — yours and everyone else's.</p><p class="pull">The money isn't betting on whose chatbot wins. It's betting on the chips, the buildings, and the power — the things every chatbot has to rent.</p><p class="figcap">📊 US private AI investment ~$285.9B in 2025 versus China's ~$12.4B — and most of it flows downhill into compute and power.</p><h2>The whole tower balances on one foundry</h2><p>Here's the bottleneck that should keep every AI executive up at night, and the one that makes our thesis concrete: nearly every leading AI chip on Earth is fabricated by <b>TSMC</b>. NVIDIA designs them. OpenAI and Anthropic and Google buy them by the hundreds of thousands. But the actual physical manufacturing of the most advanced silicon funnels through one company, on one island, in Taiwan.</p><p>Sit with that for a second. The hundreds of billions in capital, the 23-to-1 investment gap, the agentic deployments on factory floors, the metered Copilot credits — every bit of it ultimately depends on a single foundry's ability to keep etching chips at the leading edge. That is the most consequential supply-chain chokepoint of our era. And unlike a software advantage, it does not get cheaper or less constrained as demand explodes. It gets tighter. A new leading-edge fab takes years and tens of billions to stand up; demand reprices in a quarter. That gap is the whole story.</p><p>And it doesn't stop at the wafer. Those chips go into data centers, and data centers run on power — fuel, transmission, grid capacity. The binding constraint on AI in 2026 is no longer 'can we write a smarter model.' Plenty of labs can. It's 'can we manufacture the silicon and source the megawatts to run it.' Those are physical, slow-to-build, capital-intensive bottlenecks. Which is to say: they're moats. Real ones. The kind that don't evaporate when next quarter's benchmark flips, because you cannot fine-tune your way out of a power shortage or a fab queue.</p><p class="pull">The smartest model in the world still has to be etched in Taiwan and plugged into a grid that can power it. That's the real scoreboard.</p><h2>Our take: hunt the lanes, not the logos</h2><p>So here's where we land, and it's the same place this site keeps landing — because the evidence keeps pointing here. The headline-grabbing AI story is the models: who's smartest, whose agent did the cool thing on stage. The investable story is the substrate underneath: compute, power, and the supply chain that makes both possible.</p><p>Connect the dots we just walked through. Frontier models are a dead heat, so the model layer is commoditizing. Agents are only ~66% reliable, so the spend required to push them toward dependable is enormous and ongoing — that's Karpathy's 'decade,' not 'year,' for a reason. NVIDIA's customers have moved from benchmarks to production, which means perpetual inference demand rather than one-off training runs. The capital — $122B here, $45B there, chip deals in the hundreds of billions — flows past the chatbot and into silicon, buildings, and electricity. And the entire edifice balances on one foundry and a power grid that was never built for this load.</p><p>That is the picks-and-shovels map, and it was drawn by the industry's own checkbook, not by us. In a gold rush where the prospectors can't tell whose claim will pan out — because the lead changes every single quarter — you don't bet on a prospector. You sell the picks, the shovels, the fuel, and the rail line that every prospector has to use no matter who strikes it rich. For this site, that means the mid-cap lanes feeding compute and power, the ones threading the TSMC-fuel-grid bottleneck where demand is inelastic and supply is slow. The demos are dazzling and they'll keep dazzling. But durable winners are infrastructure, not demos. They always have been, in every gold rush we've ever had.</p><p class="pull">In a gold rush where nobody can tell whose claim pans out, you don't bet on a prospector. You sell the shovels and the fuel.</p><p>The frontier will keep flipping. Agents will keep climbing toward reliable. The valuations will keep being absurd. None of that changes the shape of the opportunity — it sharpens it. Watch where the dollars physically land, follow them down to the bottleneck, and you'll be looking at the right lanes long before the headlines catch up.</p><p style="font-size:0.8em;opacity:0.7">This is opinion and editorial content, not financial advice. We are not liable for any decisions made on the basis of it. Do your own research.</p>

NU original — sourced analysis of the public record. Read it in the interactive Reading Room, or browse more at neighbordoors.com.

Transparency: NU articles are AI-assisted and editor-reviewed, built from the cited primary sources. We label what's proven, alleged, and opinion.