The Information : OpenAI’s Latest Breakthrough: AI That Comes Up With New Ideas

OpenAI’s Latest Breakthrough: AI That Comes Up With New Ideas

Even as artificial intelligence has made strides in summarizing research papers or solving mathematical problems, professionals in many fields still believed humans would be in charge of coming up with ideas for new discoveries.

Now AI is getting good at such brainstorming, too.

OpenAI is preparing to launch new AI models as soon as this week that can connect the dots between concepts from different fields to suggest new types of experiments involving anything from nuclear fusion to pathogen detection, according to three people who have tested the models but are not authorized to speak about it.

The Takeaway
• New AI aims to resemble inventors like Nikola Tesla who blended information from multiple fields
• OpenAI believes it can charge $20,000 per month for doctorate-level AI
• A gap remains between ideas AI can generate and the scientists’ ability to verify them
If the upcoming models, dubbed o3 and o4-mini, perform the way their early testers say they do, the technology might soon come up with novel ideas for AI customers on how to tackle problems such as designing or discovering new types of materials or drugs. That could attract Fortune 500 customers, such as oil and gas companies and commercial drug developers, in addition to research lab scientists.

The apparent improvements highlight the benefits of AI models focused on reasoning, which the ChatGPT maker debuted in September. Reasoning models perform better the more time they can spend processing answers, and they excel in problems with solutions that can be verified objectively, such as math theorems. OpenAI last year shifted its research effort to reasoning as traditional methods of improving AI slowed.

The progress of such software helps explain why OpenAI believes it could eventually charge upward of $20,000 per month, or 1,000 times the cost of a basic ChatGPT subscription, for AI that can replicate the work of doctorate-level researchers.

The ability of OpenAI’s upcoming models to synthesize new ideas represents one of several pillars the firm is developing to match or outperform humans at “most economically valuable work,” otherwise known as artificial general intelligence.

OpenAI says it has trained its reasoning models to include a large base of knowledge across multiple domains, such as biology, physics and various types of engineering.

What makes the soon-to-be-released reasoning models unique is that they can compute an answer or suggest an idea using information from multiple fields—physics and engineering, for instance—simultaneously, two of the people who tested the models said. Most scientists, in contrast, must collaborate with experts from other fields to come up with similar answers or ideas, these people said. That’s a time-consuming process.

In that sense, the AI aims to resemble the kind of inventors who blend information from multiple fields, such as Nikola Tesla and Richard Feynman, whose knowledge in physics, engineering and math drove discoveries in electrical devices and quantum mechanics, respectively.

OpenAI’s newer reasoning models aren’t available for purchase yet but are already powering features in ChatGPT such as deep research, which browses the web to compile research reports.

With that technology, scientists can direct the AI to read publicly available literature in various scientific domains, summarize the experiments researchers have already conducted and suggest new approaches that haven’t been tried yet, said a person who has tested them. (An OpenAI spokesperson did not have a comment for this article.)

Saving Time

What highlights the potential of the new reasoning models is the fact that even the less-advanced reasoning models already available have been a game-changer for scientists.

For instance, Sarah Owens, a molecular biologist at Argonne National Laboratory in Illinois, was curious whether a statistical technique from ecology science could help her look for pathogens in wastewater.

In February, she used OpenAI’s o3-mini-high model, which is commercially available, to design a small-scale study of whether she could use an ecological technique known as occupancy modeling to predict the presence of pathogens even if they don’t show up in a particular water sample.

Trying to design the study without AI “would have taken days,” she said. “To have this information synthesized in this way really saved a lot of time.”

Similarly, Massimiliano Delferro, a chemist at Argonne who conducts research on plastic waste recycling, asked o3-mini-high whether he could use a certain technique to break down plastic waste more efficiently, and to design an experiment to test that.

He said the model returned instructions for an experiment, including a range of temperatures and pressures to use, significantly faster than he could design the experiment himself.

“That’s when I went from a skeptic to excited,” he said.

In another example, scientists at an “AI Jam Session” in February at Argonne were impressed with the ability of o1-pro and o3-mini-high to determine the potential environmental effects of building power plants and mines in specific geographic regions, according to an attendee who used the models.

“Really, where we’re headed is towards models that spend a lot of time thinking really hard about important scientific problems, and that’s something that I hope over upcoming years will make all of you 10 times or 100 times more effective,” OpenAI President Greg Brockman said at another February “AI Jam Session” event, held at Oak Ridge National Laboratory in Tennessee, which OpenAI co-hosted with the Department of Energy for 1,000 scientists from nine federal research labs.

OpenAI has said it would give several national labs private access to a reasoning model for their research, hosted on a supercomputer at Los Alamos National Laboratory.

Still, in many cases a gap will remain between the ideas AI can generate and the scientists’ ability to verify the answers.

For instance, the upcoming reasoning models can suggest how strong a laser should be to generate a certain amount of energy when it hits a capsule of fuel, but a scientist would still need a simulator or other software to test how promising the suggestion is, said a person who has tested it.

Testing With Robots

Suggestions involving chemistry or biology might require testing in a physical lab. Robots could be used to automate the experiments, scientists say, but such systems could take a long time to develop.

In these examples, combining reasoning models that can suggest hypotheses with AI agents that can access simulators or robots to test them would be key to accelerating new discoveries, these scientists say.

Some AI and robotics firms, including startup FutureHouse and Alphabet-owned Isomorphic Labs, say they are using similar approaches to automate biology research and drug discovery, respectively. In January, Sam Rodriques, CEO of FutureHouse, posted a photo of a humanoid robot on X that would eventually be able to help the startup run experiments in a physical lab.

Today’s commercially available AI agents operate primarily through computer browsers or other software applications, and they remain imperfect. They aim to automate complex tasks, from coding to processing human resources and IT queries from employees.

OpenAI has publicly released Operator, a computer- and browser-using agent, which still makes mistakes and struggles when navigating complex sites, some users say.

To improve Operator, OpenAI plans to collect data from the people who use it, filter out examples where the agent fails to complete the task and train the agent on the remaining examples, according to a person who has worked on the product.

This process, known as reinforcement learning from human feedback, is a bedrock of how OpenAI and other firms improve AI to learn tasks beyond the data used to train it.

Before the advent of reasoning models, if a traditional conversational AI model “discovered a new theorem that had never been solved, it would have been penalized because [the new theorem wasn’t] in its training data,” said David Luan, head of Amazon’s AGI SF Lab and a former head of engineering at OpenAI.

Allowing agents to “play” and try different approaches in an environment such as a browser and rewarding them when they complete a task can help them learn new skills, he said.

OpenAI, for its part, has been developing an advanced software coding agent it hopes can eventually automate the work of AI researchers and engineers, including generating code for their experiments related to AI models.