Meta’s Internal Chip Design Efforts Hit Roadblocks
The Takeaway
- Meta scraps advanced AI training chip Olympus due to design struggles.
- Decision underscores difficulty competing with Nvidia’s dominant AI chips.
- Meta has struck chip supply deals with AMD and Nvidia for data centers.
As Meta Platforms strikes new chip supply deals with AMD and Nvidia, it has been running into problems with AI chips it is designing internally, according to six people with direct knowledge of the matter.
Meta last week scrapped the most advanced chip it was developing for training AI models, after struggling with the chip’s design, and shifted its focus to a less complicated version, the people said. The company informed staffers in its AI infrastructure division about the updated plans last week. The decision underscores the difficulty tech giants face in designing AI chips that can compete with Nvidia’s offerings, which dominate the market.
Meta’s revised chip road map follows new partnerships it has struck with Advanced Micro Devices and Nvidia in recent weeks. Meta and AMD announced on Tuesday that the social media giant will purchase 6 gigawatts’ worth of AMD’s chips—roughly equivalent to the power needed to run several large-scale data centers. Meta also announced a “multigenerational” partnership with Nvidia earlier this month, in which it is committing to deploy millions of Nvidia’s current and next-generation chips in its data centers.
Meta’s in-house AI chips, which fall under its Meta Training and Inference Accelerator program, are part of the company’s broader effort to develop its own AI hardware and reduce its reliance on outside chipmakers like Nvidia. The move aims to cut costs and improve control over its data center infrastructure.
Meta, for example, expects its capital expenditures in 2026 to range from $115 billion to $135 billion. Most of this spending will go toward investments in chips and servers.
In a statement, a Meta spokesperson said: “We remain committed to investing in a diverse silicon portfolio to meet our needs, which includes advancing our MTIA portfolio and will have more to share this year.”
Other companies such as Microsoft are encountering similar issues designing their own AI chips. Last year, Nvidia CEO Jensen Huang said publicly that most big tech firms would abandon the rival chip projects they are pursuing. He predicted that the performance of those chips would continuously fall short of Nvidia’s chips.
Meta has encountered problems with several of its own chips. The company scrapped one version of its second-generation training chip, internally code-named Iris. It then began working on a more advanced training chip, code-named Olympus, but it has now scrapped that one as well.
One person who works on Meta’s chips said there is internal skepticism about the company’s attempt to build chips that match Nvidia’s in capabilities given the risk of delays or redesigns. Such a task requires a large team of engineers to design and debug the chips and to ensure their power consumption isn’t excessively high, which wouldn’t make the chips worth using versus Nvidia’s offerings, the person said.
The Iris training chip is based on a computing approach known as single instruction, multiple data. SIMD is generally easier for hardware engineers to design but harder for software engineers to program when training AI models, the people said.
Olympus was based on a computing approach similar to that used for Nvidia’s AI chips: Single instruction, multiple threads (SIMT) makes it generally easier for software engineers to program but harder for hardware engineers to design.
Many tech companies favor this approach, which Nvidia popularized, as it offers more flexibility and is better suited for training modern AI models.
Meta intended to complete the design of Olympus by the fourth quarter of 2026 at the earliest, though new chip designs typically take an additional nine months or longer beyond initial development to mass-produce, according to four of the people. The central part of Olympus that handled AI calculations—the graphics processing unit—would have utilized a design from chip startup Rivos, acquired by Meta last year.
Rivos said its GPUs could efficiently run Nvidia’s proprietary Cuda software code, the dominant software for training and running machine-learning models.
Meta initially planned to build large clusters of servers with Olympus, but executives ultimately decided that doing so would have posed a major risk to training new models as it races to compete against more established rivals such as OpenAI and Google, one of the people said. The software for training the chips wouldn’t have been as stable as Nvidia’s offerings, for example, and Olympus’ complicated design could have made it difficult to manufacture in large quantities, several people said.
Instead, Meta is opting for now to continue using training chips made by others, for which the software is more established, the person said.