OpenAI Ramps Up Audio AI Efforts Ahead of Device
The Takeaway
- OpenAI unifies teams to enhance audio AI models for an upcoming device.
- New audio model architecture offers more natural, accurate, and emotive responses.
- OpenAI’s device will act as a companion, proactively giving suggestions to help its user achieve their goals.
OpenAI is taking steps to improve its audio AI models, in preparation for its eventual release of an AI-powered personal device, said a person with knowledge of the effort. The device is expected to be largely audio-based, said three people with knowledge of it.
When people speak to ChatGPT, the chatbot can talk back but the large language model that powers the audio version is a different one to the model that powers ChatGPT’s text-based responses. Researchers within the company believe the current audio models lag behind the text-based models in the accuracy of their responses and how quickly they answer questions, according to a former employee and a current employee.
As a result, over the last two months, OpenAI has unified several engineering, product and research teams around the goal of improving audio models for its future devices, said one of the people with knowledge of the effort.
Improving the accuracy of audio models is important for OpenAI’s goals of launching a device that consumers can give spoken commands to. The first of those devices isn’t expected for about another year, The Information has previously reported.
OpenAI’s efforts to improve its audio models are beginning to pay off. A new audio-model architecture produces responses that sound more natural and emotive and provide more accurate, in-depth answers, said the person with knowledge of the effort. The new audio model will also be able to speak at the same time as a human user, which today’s models can’t do, and will handle interruptions better, this person said.
The company is aiming to release the new audio model in the first quarter of 2026, the person with knowledge of the effort said. A spokesperson from OpenAI declined to comment.
Like Google, Amazon, Meta Platforms and Apple, OpenAI is looking to develop new kinds of personal AI devices, including wearables. Some of these companies believe that today’s most popular devices, like the iPhone, aren’t optimized for future AI technology.
OpenAI researchers working on the device want users to interact with it through speech, rather than by looking at a screen. Many AI researchers—including some at Thinking Machines Lab, the AI startup cofounded by former OpenAI Chief Technology Officer Mira Murati—believe that speaking out loud is a more natural way to interact with AI because people primarily interact with each other through speech.
Some also believe that a design without a screen would reduce the chances that people will become addicted to a device. Former Apple design chief Jony Ive, who is working with OpenAI on its hardware efforts, has said that this is a priority for him, as he sees potential new devices as a way to right the wrongs of past consumer gadgets.
“Even if you’re innocent in your intention, I think if you’re involved in something that has poor consequences, you need to own it,” Ive said in a May interview with Stripe CEO Patrick Collison. “That ownership, personally, has driven a lot of what I’ve been working on.”
One obstacle that OpenAI faces today, however, is that many ChatGPT users don’t interact with the chatbot by speaking to it out loud, either due to the low quality of its audio models or because they aren’t aware of it as a feature, the former employee said. In order to build an audio-first AI device, OpenAI has to first get consumers used to speaking out loud with AI products like ChatGPT, they said.
A key figure behind OpenAI’s audio AI push is Kundan Kumar, a voice AI researcher the company hired from Character.AI this summer to lead the effort, the person with knowledge of the audio AI effort said. Other leaders include Ben Newhouse, a product research lead who has helped rewrite OpenAI’s infrastructure—largely been built for text-focused AI—for audio AI, and Jackie Shannon, a product manager for multimodal ChatGPT, the person said.
OpenAI is developing a family of devices it plans to release over time, rather than a single device, according to multiple people with knowledge of the effort. Among the ideas the company has discussed are glasses and a smart speaker without a display, they said.
Researchers working on the device told OpenAI staff in a presentation this summer that the device will act like a companion that works alongside its user, proactively giving suggestions to help the user achieve their goals, rather than as a simple conduit to apps and other software, according to the person with knowledge of the audio AI efforts. The device will be able to take in information about its surroundings and its user through audio and video when the user allows it, the person said.
A number of staffers across OpenAI work on efforts related to the device, such as its supply chain, industrial design and model research. Earlier in 2025, OpenAI acquired io, a company cofounded by Ive, for nearly $6.5 billion to design the hardware devices.