Microsoft Tightens Grip on GPUs, Pressuring AI Customers
The Takeaway
- AI startups are facing higher prices, months-long wait times to access Nvidia GPUs
- Cloud providers such as Microsoft are keeping more GPUs for internal efforts, big customers
- Microsoft employees expect GPU wait times for cloud customers to persist through the end of 2026
AI startups are struggling to access Nvidia graphics processing units as Microsoft and other cloud providers divert GPU stockpiles to their internal teams or bigger cloud customers, leaving smaller firms scrambling for the remaining servers at higher prices.
The supply crunch is impacting well-funded AI startups that have raised money from Sequoia Capital, Founders Fund, General Catalyst and Andreessen Horowitz and other major investment firms, according to founders and investors at some of those firms. The shortage prompted General Catalyst’s Hemant Taneja to send a survey to founders, inquiring about their ability to access compute, according to a person with direct knowledge.
“We’ve heard from many of you that compute—and GPU access in particular—is one of the biggest bottlenecks you’re facing this year,” Taneja wrote.
The dynamic is reminiscent of early 2023, when cloud providers clawed back capacity from their cloud services to support their internal teams and a few key customers, such as OpenAI. Venture firms including Andreessen and Index Ventures ended up securing their own pools of GPUs to ease the crunch for their portfolio companies.
But unlike that period, when AI applications were nascent, exceptional demand for AI coding tools is exacerbating the current shortage. Cloud providers are tightening GPU capacity for smaller customers as demand surges from large AI developers such as Anthropic, as well as for coding and other automation tools, according to executives at the cloud firms and startup founders.
This time around, General Catalyst is working on a way to help its portfolio companies access GPUs, potentially by facilitating a shared pool of capacity or directly negotiating on a startup’s behalf.
The chip shortage is allowing cloud firms to raise prices they charge to rent Nvidia-powered servers. That’s providing a much-needed boost to cloud provider margins, after some recently struggled to make money on the chips.
But the higher prices are felt by startups like Krea, which develops image-generating AI models. The four-year-old startup has raised $83 million in funding from investors including Andreessen and Bain Capital Ventures.
Six months ago, Krea signed a six-month contract to rent several hundred Nvidia Blackwell chips for $2.80 an hour per chip after several cloud providers competed aggressively for the startup’s business. But in the last month, when the startup began looking for more AI servers to train a model from scratch, some sales representatives at cloud providers wouldn’t pick up the phone, said co-founder and CEO Victor Perez.
When sales representatives finally did return their calls, they told him prices had increased significantly and they wouldn’t engage with him unless he signed a three-year commitment.
“Some ghosted, some told us no availability, others tried to get us into crazy bad deals,” Perez said.
And while Krea was evaluating offers for some clusters, those clusters ended up getting scooped up by other customers within days, he said.
In the end, the founders reached an agreement to pay $3.70 an hour for another several hundred Blackwell chips in a one-year commitment—32% higher than their last deal. And even that price seemed cheap, given the chip prices he had seen from other providers.
“The biggest fear for us is to not have compute where we can run our platform and run models where we want to train,” Perez said. “If the price gets a little higher, it’s not going to kill us.”
In another example, a startup founder seeking to rent a tightly connected cluster of nearly 1,000 GPUs said an Nvidia salesperson told them last week that it would be difficult to find such a cluster at the largest cloud providers, given the number of customers seeking chips in even greater numbers. The founder said they are still looking for a cluster, which would cost more than $70,000 per day to rent.
Contract Renewals
Making matters worse for startups is that Microsoft, Amazon, CoreWeave and other cloud providers are making ever-bigger multibillion-dollar commitments to provide large numbers of GPUs to Anthropic and OpenAI. (Those commitments haven’t been enough to stave off a compute shortage at Anthropic as it experiences an unprecedented growth spurt.)
Another reason for the shortage: Many AI startups previously struck two- or three-year cloud deals that are now coming to an end, giving cloud providers an opportunity to sell them higher-priced deals or reallocate that capacity to other parties.
For example, one CEO of an AI cloud provider said they recently planned to shift one customer’s GPU cluster to another customer that was willing to pay around 30% more because the original customer’s contract was up for renewal. But after the original customer pleaded to keep their chips, the cloud provider ended up letting them keep the GPUs at the higher price.
GPU cloud provider Lightning AI, meanwhile, has around 40,000 GPUs online but a backlog of potential rental orders from roughly 40 customers seeking about 400,000 GPUs, CEO Will Falcon said. That’s driven up prices more than 25% in the last six months, Falcon said, from around $1.60 per hour per chip to more than $2 now and even higher in some cases. (Most of Lightning’s chips are Hoppers, an older generation of Nvidia chips.)
Microsoft’s Use ’Em or Lose ’Em Policy
At Microsoft, demand from large customers and internal Microsoft teams prompted the company’s Azure cloud group to limit how many servers it is willing to rent to smaller customers, according to a Microsoft employee with direct knowledge. Some of the smaller customers are already facing months-long wait times to rent additional GPUs, this person said.
Azure sales leaders have recently told staff that customers should expect long wait times to persist at least through the end of 2026, this person said.
Microsoft has long reserved its largest clusters of cutting-edge chips for OpenAI and its own internal use, and is also building new clusters for Anthropic. For other Azure customers, getting access to GPUs depends on how much money they’re already spending on Azure cloud services and how much additional cash they’re willing to commit to the AI servers.
For instance, Microsoft in recent months began asking customers that want access to Nvidia Blackwell chips to commit to renting at least 1,000 of the chips for at least a year, a contract that costs tens of millions of dollars at a minimum, according to the Microsoft employee with direct knowledge.
Customers must wait weeks or months to rent even a small number of older generations of Nvidia chips on Azure, according to the Microsoft employee and a customer that has tried reserving chips.
The length of the wait time depends on customers’ existing relationship with Azure, which prioritizes customers based on a tiered system, the Microsoft employee said. Tier 1 customers who get priority access include roughly 1,000 of its biggest cloud spenders while Tier 2 customers are smaller spenders but still big enough that Microsoft assigns dedicated sales representatives to manage their counts. Tier 3 customers are smaller businesses whose relationship is managed by one of Microsoft’s reseller partners, such as CDW.
Customers that don’t commit to reserving a large number of GPUs face long wait times to rent the chips on a pay-as-you-go basis. After Microsoft gives them GPU capacity on that basis, it tracks how much they’re using the GPUs and may remove their access if they let the servers stay idle for even a few hours, the Microsoft employee said.
Similarly, Microsoft has been rescinding GPU access for some companies that got access to the chips through Microsoft for Startups, a program that gives startups free credits to rent servers. Microsoft has told such startups that they will lose access to GPUs if they aren’t fully utilizing the chips, the Microsoft employee said.
Going Direct?
Some startup founders might bypass the cloud providers altogether. Collin McLelland, founder of the startup Collide, which raised a $14 million seed round last year and is developing AI agents for oil and gas companies, said his company is considering spending around $500,000 to buy Nvidia GPUs to run on their own because he got fed up with long wait times and constraints of renting GPUs from large cloud providers. Collide is considering leasing space directly from a data center firm or a cloud provider to host the GPUs after receiving them, he said.
While buying and installing the GPUs is significantly more expensive in the short term than renting them from a cloud provider, McLelland said, he determined it was worth doing to avoid delays and uncertainty around the rentals, and expects to spend less on the GPUs compared to renting them over several years.
“It’s a huge risk for us to not have any compute when we need it,” he said. “Most people are just scared of hardware. I’ve owned oil wells so I’m numb to it.”