WSJ : Chip Startup Aims to Shatter AI’s Dreaded Memory Wall

Chip Startup Aims to Shatter AI’s Dreaded Memory Wall

Huge AI models are overwhelming servers and leaving high-powered chips idle. Google and Meta veterans say they have the solution.

Majestic Labs AI developed Prometheus, a new server system designed to overcome the “memory wall” bottleneck in running large AI models.
Each of Majestic’s new servers is capable of scaling up to 128 terabytes of high-speed memory, enough to smoothly run models with 5 trillion to 10 trillion parameters, the company says.
The company says it has secured multiple customers, projecting hundreds of millions of dollars in revenue starting in 2027.

A trio of executives who worked at Alphabet’s Google and Meta has developed a new server designed to solve the problem of how to run artificial intelligence models that are increasingly large and complex.

Majestic Labs AI, founded by Ofer Shacham, Masumi Reynders, and Sha Rabii, announced in November that it had raised $100 million from backers including Bow Wave Capital, Lux Capital, Grove and others. The three worked at Google designing and selling early generations of its data center and mobile device chips, then later building the custom silicon team at Meta Reality Labs.

Working from a bare-bones office on a nondescript commercial strip in Los Altos, the startup now says it has designed new chips that could overcome the “memory wall,” an increasingly common computing bottleneck that limits how fast AI models can respond to queries.

Majestic’s new server system is called Prometheus, and features hundreds of a new chip of its own design called an AIU, or artificial intelligence processing unit. The company’s founders say the servers have 1,000 times the memory capacity of the GPUs made by competitors like Nvidia, making them ideal for running AI models with trillions of parameters.

At that size threshold, the best-quality models “are becoming increasingly commercially not viable using existing infrastructure,” Rabii said. Insufficient memory results in high-powered chips sitting idle, despite all their processing speed, as they wait to recruit extra memory from nearby chips.

In an effort to solve this problem, each of Majestic’s new servers is capable of scaling up to 128 terabytes of high-speed memory, enough, the company says, to smoothly run models with 5 trillion to 10 trillion parameters, although the exact amount of memory is customizable to users’ needs.

“This is the first time that a processor for AI is actually designed around memory first, with these amounts of memory that are required to handle the biggest models,” Shacham, Majestic’s chief executive, said in a video interview from Tel Aviv, where he is based.

The rise of agentic AI, or the use of autonomous bots that use AI to perform tasks such as software coding, has led to a massive shortage of the computing resources available to companies. Rental prices for advanced chips have risen sharply, while some AI tools have seen more downtime or have been forced to ration usage.

Demand has grown rapidly for chips that can process queries, a process known as inference, quickly and with minimal power consumption. That has created an opening for Majestic and dozens of other hardware and software startups.

The giants of AI have gotten into the game as well. Advanced Micro Devices has touted its latest generation of chips’ suitability for inference. Late last year, Nvidia paid $20 billion to license the technology and hire the leadership team of chip startup Groq, and recently announced a new inference-focused server that features its chips.

Last week, Google Cloud announced that its new generation of TPU processors will feature one chip customized for training and one chip for inference, with a major focus on high-bandwidth memory. Cerebras, another inference chip startup, did a major deal with Amazon Web Services this year and earlier in April filed to go public in an IPO.

Majestic’s founders say that none of the current inference solutions on the market go far enough in providing sufficient memory capacity to handle the huge AI models that will be developed in the coming years. That forces chip-buyers to pay for more processing power than they need, just to get enough memory, Rabii said.

“The analogy is, I need a new garage, and you tell me I have to buy a new house,” he said.

One challenge going forward will be shortages of the memory chips needed to build Majestic’s servers, which most manufacturers expect to persist through next year, if not longer. Majestic says that it has sought to ease the effects of the supply crunch by relying exclusively on what are known as commodity DRAM chips, which are simpler to use and less expensive than high-bandwidth memory chips. HBM takes longer to produce because it involves a sophisticated process of three-dimensional stacking of multiple DRAM chips.

The startup’s special sauce, its founders say, is proprietary interconnection technology that allows Majestic to connect its processors to immense amounts of memory capacity—over 100 terabytes per server—at speeds that transfer data faster than HBM can, without using massive amounts of electrical power.