The Information : OpenAI’s ChatGPT Problem

OpenAI has a problem and, unfortunately for it, that problem is also how it makes most of its money: ChatGPT.

In this story that Sri, Amir and I published this morning, we revealed that over the past year OpenAI staffers had noticed that improvements to the company’s models weren’t necessarily lifting ChatGPT usage. Their quandary is that focusing too much on increasing the chatbot’s appeal could draw away resources from the company’s stated goal of developing artificial general intelligence, or AI that meets or surpasses human abilities.

Nevertheless, there are growing signs that the company is shifting its research strategy to be more aligned with its products. For instance, OpenAI hopes to release updates to its models for ChatGPT more frequently than in the past, as often as once a month, a current employee told me. Already, we’ve seen this trend with GPT-5.1 released in November and GPT-5.2 released this month.

This seems like a smart way to avoid building up too much hype around any particular model release—that can end up being a letdown (as we saw with GPT-5). But some current and former employees told me that they’re worried that OpenAI could become too focused on incremental updates to its models at the expense of the longer-term vision of what its AI should look like.

Other issues could arise if researchers focus too much on users’ initial reactions to changes in ChatGPT.

For instance, if a user is shown two versions of a ChatGPT response where one version is longer, they might choose the longer answer because people tend to assume longer responses are more accurate, a former employee said. In the long run, though, overly wordy responses can become annoying to users, the former employee said.

“Product and research are deeply interconnected, not oppositional,” an OpenAI spokesperson said in a statement. “Research breakthroughs shape our products, and product feedback shapes research. This is a single, unified strategy for building and safely deploying increasingly capable models, not a division between competing sides.”

For what it’s worth, a secretive new model from OpenAI, codenamed Garlic (whose existence we first reported here), could give the company an opportunity to realign its research and product teams. That’s because Garlic was created through improvements in pretraining, the first step of model training in which a model is shown lots of data from the internet and other sources so it can learn to make connections between them.

Pretraining is the part of the model development process that OpenAI has lately struggled to show improvements in. But getting better results in pretraining matters, because they flow through to the model’s performance in a variety of areas, from coding to creative writing to a model’s personality.

So theoretically once Garlic is incorporated into ChatGPT, it should improve the chatbot’s performance on a variety of areas users might ask questions about. It should also improve the chatbot’s personality, while also helping the model’s general intelligence, an area that researchers would care more about.

(Importantly, OpenAI’s recently-released GPT-5.2 isn’t based on the full-blown Garlic model, but rather, an early checkpoint of the model, according to a person with knowledge of the model, so we’ll likely see more improvement there still.)

One last note: somewhat counterintuitively, some of the information we’ve learned about ChatGPT usage has made me more optimistic that it still has a lot of room to grow. We learned during the reporting process for today’s article that many ChatGPT users simply don’t understand the full range of things they can do with the chatbot. That’s partly because the chatbot is so text-based that users might not realize ChatGPT can do things like analyze a picture of your houseplant to understand why it’s dying or look at a picture of your hand in mahjong and tell you whether you’re making an illegal move or not (both of which I’ve done in the last week).

If OpenAI can make ChatGPT’s interface more intuitive and visual, as leaders have promised, more users could discover these under-the-radar applications over time.