WSJ : How DeepSeek’s Lower-Power, Less-Data Model Stacks Up

How DeepSeek’s Lower-Power, Less-Data Model Stacks Up

The Chinese model is competitive with those of major U.S. AI developers on performance and pricing

Chinese company DeepSeek sent shock waves through Wall Street last month after unveiling a new AI model that was competitive against rival U.S. systems despite using less sophisticated chips and a fraction of the processing power.

DeepSeek is able to get more out of less because its latest R1 model relies more heavily on a process known as reinforcement learning, in which the model gets feedback from its actions using a reward system it creates and adapts for itself, according to a paper published by the company.

The model starts with an existing trove of text broken into unique words, word fragments and punctuation that can be strung back together in different ways. This “large language model” has more than 671 billion adjustable settings known as “parameters” that can be adjusted to determine how the model responds to prompts.

A model’s parameter count is one way to measure its size. Unlike traditional AI models, only a fraction of R1’s adjustable settings are active during any single operation. The reduction in active parameters drastically cuts down on the power and compute needed for processing, and allows the model to run on cheaper and less-sophisticated chips.

DeepSeek’s R1 model works by splitting into multiple networks that have different specialties, a method known as the “mixture of experts” approach. Certain prompts will call for different specialties, and to answer the prompt the model will only process the networks that it has taught itself are the most relevant.

In comparison, traditional AI models rely on enormous swaths of prelabeled data sets in a process known as supervised training. The prelabeling is done by humans and is expensive and time-consuming.

DeepSeek’s model is also distinguished in that it is open source, meaning it can be repurposed by developers outside the company.

The company’s R1 model ranks near the top of the leaderboard on Chatbot Arena, a platform run by University of California, Berkeley researchers that rates AI models.

For tasks such as math and coding, R1 performs better than most other models.

Chatbot Arena’s data is crowdsourced from visitors who use its website to ask a question, get answers from two anonymous AI models and then rate which one is better. The site has tallied more than 2.5 million votes across some 200 models.

DeepSeek’s pricing for developers to access R1 is lower than that of many other models in its intelligence class, according to data compiled by AI benchmarking firm Artificial Analysis.

Makers of AI models charge users, such as businesses that want to integrate the technology into their products, based on the amount of data—or number of tokens, in industry terms—being sent back and forth between the two parties.