Article | Gao Heng Business Talk, Author | Gao Heng
As the end of 2025 approaches, the technical focus in the global large model arena has been largely reclaimed by Google. The release of Gemini 3 Pro has surpassed all open-source models on multiple authoritative benchmarks, reestablishing the technological high ground of the closed-source camp. In an instant, the industry has once again raised doubts about whether "open-source models have reached their limits" and "whether the Scaling Law has truly hit a wall," with a sense of stagnation spreading within the open-source community.
But at this moment, DeepSeek chose not to remain silent. On December 1st, it launched two major models: DeepSeek-V3.2, which matches GPT-5 in reasoning performance, and the Speciale version, which shows extraordinary strength in mathematics, logic, and multi-turn tool calls. This is not only a concentrated demonstration of technical capability but also a direct response to the new ceiling of closed-source models under the current disadvantage in computing resources.
This is not just a simple model update. DeepSeek is trying to find a new path in the post-Scaling era: how can architectural reshaping compensate for pre-training gaps? How can the "chain of thought during tool use" achieve high efficiency with low tokens? More importantly, why has the Agent evolved from an auxiliary function into the core engine of model capability transformation?
This article will analyze these three main lines: how did DeepSeek break through the technical bottlenecks? Why did it first place a heavy bet on Agents in the open-source camp? And does this mean that there is still a way for open-source models to penetrate the moat of closed-source systems?
One, From Lagging to Competing, What Did DeepSeek Use to Enter the First Tier
In the arena of top AI models, open-source players have long been considered only able to "catch up," but not to "compete." However, the performance of DeepSeek-V3.2 this time no longer reflects the attitude of a follower.
According to official data from DeepSeek, V3.2 has fully matched GPT-5 on public reasoning benchmarks, slightly lower than Gemini 3 Pro. In several key evaluations, it not only consistently outperformed Kimi-K2-Thinking but also successfully set a new record for domestic open-source models in reasoning capabilities. In tasks such as mathematics, logic, and complex Q&A, the performance of DeepSeek-V3.2 is close to closed-source leading models, sufficient to rank among the top of the "global second tier."
The key behind this is not merely explained by the continued scaling of large models. DeepSeek's breakthrough lies in the reshaping of the underlying architecture, especially the introduction of the Sparse Attention Mechanism (DSA). In traditional Transformer architectures, the attention mechanism calculates relationships between every token and all previous tokens, resulting in a quadratic increase in computational complexity, becoming a major cost bottleneck in large model inference.
However, the "Lightning Indexer" introduced by DSA acts like a "fast predictor" in this computation — instead of allocating attention to all tokens, it quickly screens out the most critical token pairs using a small number of low-precision index heads (which can run on FP8), and only performs precise calculations on these core positions. This design reduces the complexity of the core attention mechanism from quadratic to nearly linear, allowing it to maintain a relatively stable computational load even with ultra-long context inputs of 128K.
Notably, when introducing DSA, DeepSeek did not choose an aggressive replacement but adopted a two-stage training strategy of "dense warming-up — sparse transition." In the early stages of model pre-training, the original attention structure was retained, and only the indexer was trained to mimic the original distribution; then, in the post-training phase, it gradually replaced the sparse structure, achieving a seamless switch. This "gradual architectural evolution" allows V3.2 to not only improve efficiency in long-context reasoning but also maintain precision. Tests such as Fiction.liveBench and AA-LCR show significant improvements in scores related to information retrieval, contextual consistency, and compressed expression capabilities.
However, the more industry-valuable breakthrough is not limited to this. DeepSeek introduced the "Thinking in Tool-Use" paradigm for the first time in V3.2, transforming the model's execution chain from "thinking → calling tools → ending" into an interleaved logic of "thinking → calling → continuing to think → calling again." This mechanism aligns closely with the "Interleaved Thinking" direction proposed in the Agent field, not only enhancing the logical continuity of tool calls but also allowing the model to repeatedly reuse reasoning intermediate states within a single task.
This capability is particularly crucial in real Agent scenarios. Real tasks are rarely one-step processes but require multiple rounds of information acquisition, verification, and strategy refinement. If each tool call causes the model to "forget" once, it would have to start over from scratch each time. However, V3.2's approach is to explicitly retain the "reasoning trajectory" as part of the context, and after receiving new information from the tool, continue the original thinking path. This mechanism not only reduces redundant token generation but also significantly lowers the risk of logical interruption caused by state drift.
In essence, DeepSeek's technical leap is not achieved by simply increasing FLOPs but by using computing power more intelligently. DSA makes computing allocation more efficient, and interleaved thinking makes tool calls more stable. These two dimensions collectively point to a goal: making the model a "sustainable thinking intelligent agent," rather than just a large language completer.
This also means that, after the scale红利 has peaked, future model competition will gradually shift from "number of parameters" back to "thinking organization ability" and "energy efficiency ratio." V3.2 happens to be an early indicator of this shift.
Two, Betting on Agents, Not Following Trends, But Strategic Turnaround
Compared to the technical breakthroughs in model performance, the biggest change in DeepSeek-V3.2's strategic path is that it places "Agent capability" and "reasoning capability" side by side, clearly listed as core indicators in the technical documentation. This is a direction adjustment that few domestic open-source models have openly emphasized before. In DeepSeek's view, an Agent is not just an accessory module for tool calls, but a bridge between model capability release and industrial application, and even a vanguard for the platformization of large models in the future.
This judgment is not a romantic technological fantasy detached from reality. In the past year, the large model industry has experienced an important shift: enterprises have begun to realize that the marginal value of "smarter chatbots" is diminishing, and the true core role capable of forming a commercial loop is the "action-capable" Agent. From automatically writing reports and generating reports to batch processing work orders and code fixes, companies are willing to pay for these "executable" intelligent agents, rather than for a sentence that sounds more human-like.
This also explains why DeepSeek invested heavily in building an Agent training system after the V3.2 training phase and self-built a large-scale task generation pipeline. According to official disclosures, the team synthesized over 1,800 intelligent agent environments and designed approximately 85,000 high-complexity task prompts around Agent tasks. These tasks were not derived from manually annotated data but were generated automatically by environment builders and trajectory scoring mechanisms, and were optimized through reinforcement learning to form a closed-loop training process.
This approach breaks away from the traditional pre-training reliance on massive dialogue data. Compared to this, Agent task trajectories have stronger structural, verifiable, and scarce characteristics. Once completed, the training effect will far exceed conventional "dialogue completion." More importantly, the reinforcement learning mechanism allows the model's capabilities to continuously optimize through feedback loops, rather than being limited by one-way iteration in the pre-training stage.
DeepSeek adopted its own GRPO (Group Relative Policy Optimization) strategy in training and made deep local adaptations to suit large-scale multi-turn task training. During this process, the model not only needs to optimize the reasonableness of single-turn outputs but also balance the reasoning consistency and language expression stability in multi-turn tasks. To avoid the problem of "catastrophic forgetting" in traditional RL, DeepSeek integrated the reasoning reward, language consistency score, and task completion score into a multi-dimensional reward signal, ensuring that the model maintains the integrity of the Agent execution chain throughout the training process.
To support this complex training mechanism, the model's own "state awareness capability" must also be upgraded synchronously. V3.2 introduced a complete context management strategy: the model resets the thinking state only when the user sends a new message, while during continuous tool calls, the reasoning trajectory is fully preserved. This means the model can accumulate "thought residues" and continue reasoning after the tool returns new information without restarting the logic. This "state continuation mechanism" becomes an important guarantee for the continuity of Agent multi-turn behavior and enables the model to handle more complex, cross-phase task decomposition.
From a system logic perspective, DeepSeek's understanding of Agents has risen from a "task execution plugin" to a component of the "model operating system." It is not an external plug-in, but a part of the core operation structure of the model. This shift in system perspective implies that the future form of large model platforms will tend toward a scheduling operating system: the model itself is the OS kernel, Agents are the user-mode execution programs, and plugins become callable modules. Whoever controls the standard of the Agent layer may dominate platform authority in the AI era.
This is also why DeepSeek tries to lead the unified paradigm of "interleaved thinking + tool usage" and proposes the fundamental design language of "Thinking in Tool-Use." This is not just a difference in technical details but also a manifestation of platform thinking.
For the industry, DeepSeek's latest shift marks a new watershed: Agent capabilities are no longer an optional add-on for engineering teams but a core branch in the model development path. Whether or not a model has platform-level Agent capabilities has become one of the key indicators for measuring its long-term competitiveness.
Three, Where Is the Limit of Open-Source Models? DeepSeek's "Post-Training Tactics" Try to Provide an Answer
Although V3.2 and Speciale have achieved a reversal from catching up to competing in multiple benchmarks, DeepSeek also acknowledged in the technical report that the gap between open-source models and closed-source systems has widened in certain key dimensions. Especially in terms of knowledge breadth, handling of extremely complex tasks, and token generation efficiency, the open-source system is still constrained by resources, data, and budget.
Instead of hiding these limitations, DeepSeek offers a highly executable strategy as a response: if you can't compete in resources, then focus on methods and deepen the training process.
The core of this strategy is its unique "three-post-training" approach: expert distillation + multi-track reinforcement learning + integration of tool-thinking mechanisms.
First, expert distillation. While most models are still primarily trained with general data, DeepSeek custom-designed six types of expert models for V3.2, covering core capability areas such as mathematics, programming, logical reasoning, general Agents, Agent programming, and Agent search. Each category has a dedicated model, which strengthens specific skills in its own dataset and generated trajectories. These experts are not directly deployed but used to generate high-quality training samples to feed back into the main model.
Subsequently, the data produced by these "task-specialized models" are used to train a general model. Technically, this is equivalent to using multiple "top students" who excel in specific subjects to reverse-feed a "well-rounded talent," avoiding capability dilution in multi-task training while retaining structural connectivity between different tasks.
The second layer is the expansion and upgrade of reinforcement learning (RL). DeepSeek continues the GRPO (Group Relative Policy Optimization) strategy from V3.2-Exp and further upgrades the data and reward structure. The model not only needs to complete tasks but also optimize language quality, the logical reasonableness of the reasoning chain, and the natural ability to call tools. The computational investment in the entire post-training phase exceeds 10% of the pre-training budget, which is rare in the open-source model system.
More importantly, the reinforcement learning process does not rely on human evaluation but uses the feedback mechanism of the task environment and rubric-based automatic scoring. This design allows the model training to move beyond human alignment data and enter a closed-loop learning path of "structured tasks - automatic scoring - behavior optimization," thus forming model capabilities that are more scarce than Chat data but more reusable.
The third layer is the integration of tool usage and the "chain of thought." In the early stages of training, the model often cannot understand "when to call a tool and when to continue thinking," leading to broken reasoning chains and logical interruptions. To address this, DeepSeek designed a cold-start system prompt for V3.2, naturally embedding examples of tool calls in the thinking trajectory, enabling the model to gradually learn to "think with tools" in multi-turn tasks, rather than "think first and then call tools."
Additionally, the entire context state has been redesigned: tool calls do not interrupt the thinking content, and only new user input triggers clearing. This strategy significantly reduces token redundancy and avoids the problem of starting from scratch for each task.
These technical designs may seem engineering-oriented, but they all point to an essential question: under the constraints of parameter count and training scale, how can open-source models improve the "intelligent density per token"?
DeepSeek's answer is to compress resources as much as possible into the key paths of the reasoning chain, allowing each round of reasoning to carry as much information as possible and minimize repetition. This is not a victory of scale, but a victory of methodology.
Of course, even so, DeepSeek has not yet completely closed the knowledge gap between open-source and closed-source systems. The official report also points out that V3.2's world knowledge breadth still lags behind the latest closed-source models, and although the Speciale model performs well in complex competitions, its token consumption is significantly increased and is not yet suitable for general daily use scenarios.
But if Gemini 3 Pro represents the closed-source camp's continued exploration of "bigger, faster, stronger," then V3.2 and Speciale represent perhaps a new path of "lighter, more stable, and smarter." Amid ongoing debates about the future of the Scaling Law, DeepSeek is trying to reshape the competitive order of open-source models with stronger reasoning organization, less resource consumption, and more efficient training paradigms.
Original: toutiao.com/article/7579547800615109160/
Statement: This article represents the views of the author.