The Information :

Former Google, Apple Researchers Raising $50 Million for New Visual AI Startup

The Takeaway

Former Google, Apple researchers launch new visual AI startup Elorian.
Elorian seeks $50 million seed round led by Striker Venture Partners.
Elorian develops AI models for simultaneous text, image, video, audio.

Andrew Dai, a veteran AI researcher who recently left Google DeepMind after 14 years, is launching a new startup focused on AI models that understand and process text, images, video and audio simultaneously, Dai said.

The new startup, Elorian, is in talks with investors to raise a seed round of around $50 million, said Dai and another person with direct knowledge of the matter. Striker Venture Partners, a venture capital firm founded last October by Max Gazor, former general partner at VC firm CRV, is in talks to lead the round, said the person.

Yinfei Yang, an Apple research scientist who worked on that company’s AI models before departing in December, is a co-founder of Elorian, said the person. Both Dai and Yang have updated their LinkedIn profiles to show they’re working at a “stealth” company, and Dai’s indicates that he is CEO.

In a phone interview on Saturday, Dai said Elorian focuses specifically on building AI models that can visually interpret and analyze the physical world by simultaneously processing images, video and audio. While robotics is one potential use for Elorian’s AI, Dai said the startup envisions many others, without elaborating. Yang did not immediately respond to a request for comment.

Early AI models from developers like OpenAI were trained only on text, but there has been a shift in recent years toward models trained on images and video. This area of research, known as visual reasoning, is now a focus for many large AI providers and startups, including Google, OpenAI and Anthropic. Amazon launched a similar AI model last month at its annual cloud conference.

Visual reasoning models are designed for complex AI applications, such as robotics systems, because their ability to combine multiple functions saves developers the work of stitching together different AI models. Some researchers say the technology is valuable for AI agents that need to interpret and understand images like screenshots to carry out advanced tasks such as handling retail product returns and reviewing legal documents.

At Google DeepMind, Dai was co-leader of the data-focused pre-training work that underpins Gemini models, according to his LinkedIn profile. Dai has co-authored research papers with other well-known Google researchers, including Quoc V. Le and Jeff Dean, chief scientist for Google DeepMind and Google Research.

Dai was a pioneer in language models and has been working on pre-training–related research for the past two decades, the person said. Much of his research has focused on developing techniques for evaluating the quality of the data used in training AI models and ensuring that models are trained on data from a range of difference sources, added the person.