WSJ : How Google Finally Leapfrogged Rivals With New Gemini Rollout

How Google Finally Leapfrogged Rivals With New Gemini Rollout

Top-scoring model’s debut shakes up AI chatbot race: ‘I think we’ve hit on something’

Google’s Gemini 3 model surpassed competitors in industry benchmark tests, demonstrating advanced capabilities in AI.
Gemini 3’s launch presents a challenge to OpenAI and Anthropic, outperforming them in over a dozen intelligence categories.
Google’s market value reached $3.6 trillion, exceeding Microsoft’s for the first time in seven years.

Call it America’s next top model.

With the release of its third version this week, Google’s Gemini large language model surged past ChatGPT and other competitors to become the most capable AI chatbot, as determined by consensus industry-benchmark tests.

The results represent public validation for Google employees who, for months, have been conducting their own, personal tests of the model—asking it for jokes, trying to stump it with math problems—and coming away convinced they had something that would finally tilt the LLM field in the company’s favor.

For one of her “vibe checks,” Tulsee Doshi, Gemini’s senior director of product management, asked the model to write in Gujarati, a language that is spoken widely in India but isn’t especially prevalent on the internet. The results were far better than what she had gotten from earlier models.

“I call it signs of life, right?” she said. “People were coming back and saying, ‘I feel it, I think we’ve hit on something.’”

Aaron Levie, chief executive of the cloud content management company Box, got early access to Gemini 3 late last week, several days ahead of the launch. The company ran its own evaluations of the model over the weekend to see how well it could analyze large sets of complex documents.

“At first we kind of had to squint and be like, ‘OK, did we do something wrong in our eval?’ because the jump was so big,” he said. “But every time we tested it, it came out double-digit points ahead.”

The launch of Gemini 3 has handed Google an elusive victory: The company, for the first time in years, has pulled well ahead in the race to develop artificial intelligence.

The release of its latest AI model this week dazzled users who praised its intelligence, accuracy and creative capabilities. On Thursday, the company said Gemini 3 would power a new version of Nano Banana, a popular image-generation tool that has already driven rapid growth in Gemini usage this year.

The success of the new model poses a significant challenge to OpenAI, Anthropic and other startups vying for AI dominance. Gemini 3 outperformed competing models on more than a dozen benchmark tests scoring a range of intelligence categories.

“They are AI winners, that’s pretty clear,” said MoffettNathanson analyst Michael Nathanson. “I feel pretty good about their hand right now.”

OpenAI’s ChatGPT is still by far the most popular AI chatbot. The company said this month it now has 800 million users each week, compared with Gemini’s 650 million monthly users. And Anthropic’s Claude is widely regarded as one of the leading models for coding. But Gemini 3’s advances have the potential to cement it as a preferred tool for a diverse set of tasks, users and analysts say.

Google has been scrambling to get an edge in the AI race since the launch of ChatGPT three years ago, which stoked fears among investors that the company’s iconic search engine would lose significant traffic to chatbots. The company struggled for months to get traction.

Chief Executive Sundar Pichai and other executives have since worked to overhaul the company’s AI development strategy by breaking down internal silos, streamlining leadership and consolidating work on its models, employees say. Sergey Brin, one of Google’s co-founders, resumed a day-to-day role at the company helping to oversee its AI-development efforts.

During its annual developers conference in May, Google unveiled a suite of sophisticated AI products and a revamped version of its classic search engine featuring AI Mode, which answers search queries in a chatbot-style conversation. It helped some investors gain confidence that the company was mounting a comeback, Nathanson said, but the share price still languished this summer.

“Wall Street was debating whether these guys were going to be AI roadkill,” he said.

Then, in August, the debut of Nano Banana launched Gemini usage on its fastest trajectory to date. Gemini’s 650 million monthly users jumped from 450 million in July.

The company also notched a significant victory in September, when a federal judge declined to levy harsh penalties against the company after earlier finding it maintained an illegal monopoly in the search market. The judge said the competitive dynamics of the market were changing already, largely because of AI.

Google parent Alphabet GOOGL 3.53%increase; green up pointing triangle reported record quarterly revenue last month, largely because of growth in cloud computing and advertising. Its shares are up more than 50% this year and more than 60% since the summer. Its market value this week hit $3.6 trillion, surpassing that of Microsoft for the first time in seven years.

Google aimed to develop Gemini 3 to succeed in some of the most challenging areas of artificial intelligence. The company’s engineers and researchers wanted to improve this model’s ability to “see,” analyze and generate all means of content—text, images, audio, video and code. And they wanted to improve its capacity for thought and reasoning in the interest of building a better personal assistant in coding and other tasks.

After the launch, a table showing Gemini 3’s score on 20 benchmark tests circulated widely online. The model significantly outscored the latest ones from ChatGPT and Anthropic on tests involving expert-level knowledge, logic puzzles, math problems and image recognition. It took second place to Anthropic’s Claude Sonnet 4.5 on a single benchmark involving coding.

Google ran some of the tests internally. The rest were done by other companies. Employees spent the weekend before launch in anticipation of getting scores back, some of which were far higher than anticipated.

Doshi said the best surprise was Gemini 3’s high performance on an evaluation called Vending Bench, which tests a model’s ability to think and act over time by asking it to operate a vending machine. The model must track inventory, place orders and set prices in order to make money in the simulation.

“Vending Bench reflects one of the things we were hoping to really shift with this model and push on, which is improved tool use and improved planning,” she said.

As part of the launch, Google began offering subscribers the opportunity to use Gemini 3 in AI Mode, the first time the company integrated a new model into search on day one. The company has plans to make it available soon for everyone in the U.S.

Robby Stein, vice president of product for search, said he worked for months with the Gemini team to determine how the new model could improve the delivery of search-engine results. For one of his vibe checks, he used AI mode to ask for help explaining to his 7-year-old daughter the concept of lift force on an airplane.

He expected a written response. The result: an interactive simulation that showed currents moving over a wing, with a slider allowing him to move the wing, change the currents and lift the plane into the air.

“I was like, ‘Wow, this actually can be capable of presenting information in the best way given the question,’” he said. “That was my main ‘aha’ moment with this product.”