SCMP : DeepSeek kicks off 2026 with paper signalling push to train bigger models

DeepSeek kicks off 2026 with paper signalling push to train bigger models for less

DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

Chinese artificial intelligence start-up DeepSeek has ushered in 2026 with a new technical paper, co-authored by founder Liang Wenfeng, that proposes a rethink of the fundamental architecture used to train foundational AI models.

The method – dubbed Manifold-Constrained Hyper-Connections (mHC) – forms part of the Hangzhou firm’s push to make its models more cost-effective as it strives to keep pace with better-funded US rivals with deeper access to computing power.

It also reflected the increasingly open, collaborative culture among Chinese AI companies, which have published a growing share of their research in public.

For industry watchers, DeepSeek’s papers often provide an important early signal of the engineering choices that will shape the start-up’s next major model release.

In the paper, released on Thursday, a team of 19 DeepSeek researchers said they tested mHC on models with 3 billion, 9 billion and 27 billion parameters, and found it scaled without adding significant computational burden.

“Empirical results confirm that mHC effectively … [enables] stable large-scale training with superior scalability compared with conventional HC (hyper-connections),” wrote the researchers, led by Zhenda Xie, Yixuan Wei and Huanqi Cao.

Liang was listed as the final author.

The team added that “crucially, through efficient infrastructure-level optimisations,” mHC delivers these gains with “negligible computational overhead”.

The publication also offered fresh evidence that Liang, who has kept a low profile despite DeepSeek’s increasing fame, remains closely involved in core research at one of China’s most closely watched AI companies.

Hyper-connections were first proposed by ByteDance researchers in September 2024 as a tweak to ResNet (residual networks) – a dominant deep learning architecture introduced in 2015 by Microsoft Research Asia scientists including legendary Chinese computer scientist He Kaiming.

ResNet enables the training of very deep neural networks by stabilising the training so that key information, or residual, is retained as the number of layers increases.

It has become integral to major large language models such as OpenAI’s GPT as well as Google DeepMind’s Nobel-winning AlphaFold system.

However, ResNet has notable limitations, including difficulty ensuring that the learning signal that flows through the neural network remains strong without “collapsing” into a one-size-fits-all state.

According to the DeepSeek researchers, ByteDance’s HC solution successfully addressed these issues by expanding the residual stream and enhancing the complexity of the neural network, “without altering the computational overhead of individual units.”

DeepSeek argued, however, that the earlier approach did not fully account for rising memory costs, leaving its “practical scalability” constrained for large-model training.

Instead, they proposed an additional tweak that “constrains” the HC network with a specific manifold to ensure compute and cost efficiency.

“mHC will help address current limitations and potentially illuminate new pathways for the evolution of next-generation foundational architectures,” the researchers wrote.

The paper was uploaded to the open-access repository arXiv by DeepSeek CEO Liang Wenfeng himself, who has also posted DeepSeek’s more prominent technical papers in recent years, including work linked to its R1 and V3 models.

Other less important papers have typically been uploaded by other researchers.

Florian Brand, a PhD student at Germany’s Trier University and an expert on China’s AI ecosystem, said DeepSeek’s papers often acted as an early signal of the technical direction behind its next generation of models.

Industry expectations are running high that DeepSeek could release its next major model in the run-up to the Spring Festival holiday in mid-February.

Previously, the company released its groundbreaking R1 model on the eve of last year’s national holiday, fuelling speculation it could repeat that playbook this year.