The Information : Google in Talks With Marvell to Build New AI Chips for Inferen

Google in Talks With Marvell to Build New AI Chips for Inference

The Takeaway
  • Google taps Marvell to design two new AI inference chips.
  • New memory processing unit and TPU target efficient AI inference.
  • Google aims to diversify its AI chip design partners from Broadcom.

Google is in talks with Marvell Technology to develop two new chips aimed at running AI models more efficiently, according to two people with direct knowledge of the discussions. One is a memory processing unit designed to work alongside Google’s tensor processing unit. The other is a new TPU built specifically for running AI models.

The moves underscore surging demand for inference chips that run AI powering commercial products such as autonomous agents. At its GTC conference in March, Nvidia released a chip designed to improve the efficiency of inference workloads. Called a language processing unit, the chip is built on technology Nvidia licensed from startup Groq for $20 billion.

While Google has bought data center chips from Marvell before, those purchases were off the shelf, whereas the current discussions aim at designing semiconductors exclusively for Google’s needs. The discussions are the latest sign Google wants to diversify from Broadcom, long the sole design partner for Google’s TPU.

Google had previously considered replacing Broadcom with Marvell as the supplier for the networking interface chips that connect servers to ethernet switches in Google’s data centers, The Information reported in 2023.

Google had been planning to develop new inference chips and accelerated the work after Nvidia’s launch of the LPU, according to a Google employee. Marvell is Groq’s chip design partner for the first-generation LPU. That ensures Marvell has the experience to design an inference chip.

Google’s talks with Marvell on a new TPU were reported by Funda AI.

Google has previously purchased CXL controller chips from Marvell. Those chips manage how servers share memory across a data center, according to two Google employees. That prior work gave the company confidence in Marvell’s ability to design more new chips with Google, according to two Google employees.

Google’s new memory processing unit would work alongside TPUs, dividing AI workloads with TPUs based on their compute and memory demands, the two people said. Google and Marvell aim to finalize the design of the memory processing unit as soon as next year before handing it off for test production, according to the two people.

Google plans to produce nearly 2 million memory processing units, the two people added, though that figure could change as the discussion is still in early stages. By comparison, Morgan Stanley estimates Google will produce around six million TPUs in 2027. It is unclear when the design work for the new TPU will wrap up and how many of them Google plans to produce. The memory processing units can work with existing TPUs.

Google currently produces its chips at Taiwan Semiconductor Manufacturing Co. It remains unclear whether TSMC or another chipmaker would produce the new chips.

For years, Google used the TPU only in its own data centers to power businesses including search, YouTube and Gemini models, and made it available only to Google Cloud customers. That changed last year, when Google started leasing TPUs to customers for use in non-Google data centers, in a direct challenge to Nvidia’s dominance in AI chips. Google’s TPU has also won over customers including Anthropic, Meta Platforms and Apple.

The rise of inference-specific chips comes as AI firms release more sophisticated products such as autonomous agents, which require more computing power than traditional AI apps like chatbots.

Still, not all inference tasks are alike. Some steps in generating a response require lots of computing power, while others are bottlenecked by how quickly a chip can move data in and out of memory. Using different types of inference chips for the different tasks, rather than running everything through one type of processor, has become a key way AI firms improve efficiency and reduce cost.

OpenAI, for instance, recently struck a deal to spend more than $20 billion on inference chips from Cerebras, a rival of Nvidia and Groq, while also using other firms’ inference chips. OpenAI is also developing its own inference chip with Broadcom.

Marvell, which designs standard networking, storage and optical interconnect chips used in data centers, has built a growing business helping customers design chips tailored to their needs. That custom business has become its fastest-growing segment.

Google has been trying to wean itself off Broadcom since 2023, mostly due to the high fees it charges. Broadcom collects a fee on every TPU that gets produced. As demand for TPUs surges, so do the bills Google has to pay Broadcom.

Last year, Google brought in Taiwanese firm MediaTek to help design and produce TPU chips. Still, Broadcom remains Google’s key chip design partner. Broadcom signed a new agreement with Google earlier this month to develop and supply custom TPUs and networking components for Google’s next-generation AI data center racks through 2031, showing that Broadcom remains central to Google’s chip venture.