Nature, a top-tier journal, publishes another article praising Liang Wenfeng's latest DeepSeek from Zhejiang University

On December 5, 2025, one of the world's three top journals, Nature, published an article praising Liang Wenfeng's latest version of DeepSeek from Zhejiang University. This is the third report by Nature on DeepSeek, far exceeding other large models.

Liang Wenfeng has also published a formal paper in a top-tier journal.

Given Liang Wenfeng's achievements, he should have been selected for academician status instead of those state-owned enterprise officials.

Some excerpts from the original text are translated as follows:

A Chinese artificial intelligence company, DeepSeek, has released a mathematical reasoning model that can identify and correct its own errors. The model achieved the highest human score in one of the most prestigious undergraduate mathematics competitions globally.

The model DeepSeekMath-V2 scored 118 out of 120 in the 2024 William Lowell Putnam Mathematics Competition, breaking the previous human high score of 90. The model also performed excellently in the 2025 International Mathematical Olympiad (IMO) and the 2024 China Mathematical Olympiad. The results are detailed in preprint 1 published on arXiv on November 27.

Reasoning Over Answers

Early methods for training large language models for mathematical reasoning focused on the accuracy of the final answer, according to the preprint authors. However, a correct answer does not guarantee correct reasoning. Sometimes, a correct final answer may simply be the result of a lucky mistake. Moreover, focusing solely on the final result is unhelpful for proving mathematical theorems or formulas, as logical reasoning is more important than the final answer.

DeepSeekMath-V2 introduces self-verification for mathematical reasoning for the first time. The model consists of a verifier trained to assess mathematical proofs—proofs based on a series of step-by-step reasoning—to identify logical flaws and assign scores based on the rigor of the proof. Subsequently, a meta-verifier checks whether the verifier's criticisms are accurate, reducing the likelihood of hallucinations and increasing credibility. These components collaborate with a proof generator to produce solutions and evaluate their own work, continuously optimizing the arguments until more issues are found.

The design forms a feedback loop: the verifier improves the generator, and as the generator produces more challenging proofs, these proofs become new training data to enhance the verifier.

In contrast, Gemini's Deep Think verifies mathematical reasoning using an external symbolic language called Lean, which requires significant expert involvement in the verification process. Xie noted that this method has very few hallucinations but is computationally intensive and resource-heavy.

Original article: toutiao.com/article/1850680795913216/

Statement: This article represents the views of the author himself.