【By Observer News, Xiong Chaoran】On the evening of January 12, Liang Wenfeng, founder of Chinese AI startup DeepSeek, co-authored a technical paper with researchers from Peking University, proposing a new model training technique. They stated that this technology can achieve "positive expansion of parameters" by bypassing the GPU memory limitations.
Hong Kong's South China Morning Post reported on January 13 that this move highlights DeepSeek's continued focus on maximizing cost efficiency, despite its relative gap in computing power compared to leading U.S. industry companies. Meanwhile, there are external speculations that the company will release an important new model before this year's Spring Festival.
The report noted that this highly technical paper is expected to attract widespread attention from professionals in both China and the U.S., who hope to learn about DeepSeek's latest progress. Over the past year, DeepSeek has been a model of innovation in China's AI field.

DeepSeek and Peking University researchers co-published a paper, Liang Wenfeng is listed. Paper screenshot
According to the report, in the latest paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," they introduced a "conditional memory" technology called "Engram" (memory trace).
This technology addresses a key bottleneck when scaling up AI models - the limited high-bandwidth memory (HBM) capacity of GPUs.
Existing large language models (LLMs) retrieve basic information through computation, which requires a lot of computational power. However, researchers said that this approach wastes valuable "sequential depth," which could have been allocated for more complex reasoning tasks.
The South China Morning Post pointed out that HBM is one of the biggest gaps between China and the U.S. in AI hardware. Ray Wang, an analyst at the South Korean semiconductor industry analysis firm SemiAnalysis, stated that although there have been steady advances in recent years, China's memory chip giant Changxin Memory Technologies (CXMT) still lags behind industry leaders like Samsung Electronics, SK Hynix in South Korea, and Micron Technology in the U.S. by several years.
In the paper, DeepSeek and Peking University researchers said that by decoupling computation and storage, Engram allows the model to more efficiently "look up" this basic information.
Their new technology also improves the model's efficiency when processing long contexts (i.e., longer inputs), which is one of the biggest challenges in turning AI chatbots into practical AI agents in the real world.
The researchers validated this technology in a model with 27 billion parameters, finding that it improved performance on major industry benchmarks by several percentage points. The key point is that it also reserves more capacity for the model to perform more computationally demanding complex reasoning.
They wrote, "We believe conditional memory will become an essential modeling primitive in next-generation sparse models." The researchers compared the potential impact of Engram to their own developed "Mixture of Experts" (MoE) technology, which enabled model scaling without proportionally increasing computational requirements, and which has since been adopted by other Chinese competitors.

Video screenshot of DeepSeek founder Liang Wenfeng
Currently, the largest models in the industry have trillions of parameters. Elie Bakouch, a research engineer at the open-source developer platform Hugging Face, praised the paper on social media, calling it "a technique verified on hardware during inference and training."
According to the report, the paper lists 14 co-authors, including Liang Wenfeng, as well as Zhang Huishuai, assistant professor at the Institute of Computer Science and Technology at Peking University and former chief researcher at Microsoft Asia Research Institute.
Last year, DeepSeek released its large model DeepSeek-R1, which was trained using data centers powered by NVIDIA H800 GPUs, completing training in just two months at a cost of $5.5 million, a fraction of the amount spent by U.S. companies like OpenAI. It achieved results comparable to top American AI models, shocking the industry and drawing attention from many countries, especially the U.S.
On January 12 local time, according to the Financial Times, Microsoft President Brad Smith warned that in the competition for users outside the West, U.S. AI companies are being overtaken by Chinese competitors, with China's low-cost "open-source" models being a major advantage. He stated that DeepSeek's technology is rapidly spreading in emerging markets such as Africa, highlighting the competition faced by U.S. companies globally. "We must recognize that unlike a year ago, now China has one, and increasingly more than one competitive open-source model."
The report noted that Smith made these remarks as a new study by Microsoft found that DeepSeek's R1 large language model, released a year ago, helped accelerate the global spread of AI, especially in Global South countries, due to its "usability and low cost." This also allowed China to surpass the U.S. in the global market share of "open-source" AI models, which are usually available free of charge for developers to use, modify, and integrate.
The South China Morning Post pointed out that as DeepSeek marks the one-year anniversary of releasing its R1 model, expectations for its upcoming important new model are rising. On January 9 local time, the emerging tech media in Silicon Valley, "The Information," reported that DeepSeek is expected to launch a new V4 model with strong programming capabilities in mid-February this year.
This article is exclusive to Observer News. Unauthorized reproduction is prohibited.
Original: toutiao.com/article/7594821762757231114/
Statement: This article represents the personal views of the author.