Machine Intelligence News

Editor: Spicy Chicken, Zewan

Prequel to DeepSeek R2?

With the arrival of Labor Day, DeepSeek has not stopped releasing new information.

A few days ago, rumors were circulating everywhere that DeepSeek-R2 was about to be released. DeepSeek did have new developments, but instead of waiting for R2, everyone got DeepSeek-Prover-V2, which is also open source.

Prover-V2 has achieved the best performance in the theorem proving track, reaching an 88.9% pass rate on the MiniF2F test and scoring well on AIME 24 and 25.

On the evening of April 30th, technical details of DeepSeek-Prover-V2 were updated on the machine learning collaboration platform HuggingFace.

This time, the DeepSeek team released two versions of the DeepSeek-Prover-V2 model with parameter scales of 7B and 671B respectively.

Among them, DeepSeek-Prover-V2-671B was trained based on DeepSeek-V3-Base, while DeepSeek-Prover-V2-7B was built on DeepSeek-Prover-V1.5-Base and supports up to 32K token context length expansion.

  • DeepSeek-Prover-V2-7B link: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-7B
  • DeepSeek-Prover-V2-671B link: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B

To summarize DeepSeek-Prover-V2 in one sentence: it's an open-source large language model specifically designed for "mathematical AI programming language" Lean 4, focusing on formal theorem proving.

Its initial data was collected through a recursive theorem proving process driven by DeepSeek-V3. During the cold start training phase, complex problems were first decomposed into a series of solvable subgoals by prompting DeepSeek-V3. Each solved subgoal was integrated into a "thought chain." By blending the step-by-step reasoning trajectory of DeepSeek-V3, an initial training dataset for reinforcement learning was jointly constructed.

The ingenuity of this strategy lies in its ability to integrate informal and formal mathematical reasoning into a unified model, allowing the model to think flexibly like humans and rigorously argue like machines, truly achieving integrated fusion of mathematical reasoning.

So how exactly does it work? DeepSeek also released the technical report for DeepSeek-Prover-V2, let's see what it says:

Technical Overview

Cold Start Inference Data Generation via Recursive Proof Search

In order to build the cold start dataset, the DeepSeek team designed a simple and efficient recursive theorem proving process using DeepSeek-V3 as a unified tool, responsible for both breaking down subgoals and formulating proof steps formally. The specific process involved guiding DeepSeek-V3 to break down theorems into high-level proof sketches while simultaneously formalizing these reasoning steps in Lean 4 language, ultimately generating a series of clearly structured and logically rigorous subgoals.

Overview of the Cold Start Data Collection Process for DeepSeek-Prover-V2.

Reducing computational overhead has always been a strong point for the DeepSeek team, and this time was no exception. They used a smaller 7B model to perform proof searches for each subgoal, reducing computational burden. After all the steps of solving a complex problem were successfully resolved, they corresponded the complete formalized step-by-step proof with the thought chain generated by DeepSeek-V3, combining them into cold start inference data.

An example of converting decomposed subgoals into a series of lemma statements.

Reinforcement Learning Based on Synthetic Cold Start Data

The DeepSeek team selected some challenging theorem problems. Although the 7B proof model couldn't solve them end-to-end, it could handle the series of subgoals derived from them.

Integrating the proofs of all subgoals can construct the complete formal proof of the original problem. Subsequently, this formal proof was attached to the thought chain generated by DeepSeek-V3, which demonstrated the corresponding lemma decomposition process, thus forming a training dataset that tightly integrates informal reasoning with subsequent formalization processes.

After fine-tuning the proof model with synthetic cold start data, the research team further introduced a reinforcement learning stage to enhance the model's ability to convert informal reasoning into formal proofs. During the training process, following the general objective of reasoning models, binary feedback ("correct" or "incorrect") was used as the primary reward signal.

The resulting model, DeepSeek-Prover-V2-671B, achieved state-of-the-art performance in neural theorem proving tasks, reaching a pass rate of 88.9% on the MiniF2F-test and successfully solving 49 out of 658 problems in the PutnamBench dataset. All proofs generated by DeepSeek-Prover-V2 on the miniF2F dataset have been compiled into a ZIP file and are available for download.

Download Link: https://github.com/deepseek-ai/DeepSeek-Prover-V2/blob/main/minif2f-solutions.zip

Training Details, Experimental Results

DeepSeek-Prover-V2 underwent two-stage training, establishing two complementary proof generation modes:

1. Efficient non-CoT (non-chain-of-thought) mode: This mode is optimized for quickly generating formal Lean proof code, focusing on producing concise proofs without explicit intermediate reasoning steps.

2. High-precision CoT (chain-of-thought) mode: This mode systematically outlines intermediate reasoning steps, emphasizing transparency and logical progress, then constructs the final formal proof.

Consistent with DeepSeek-Prover-V1.5, these two generation modes are controlled by two different prompts. In the first stage, expert iteration was adopted within a course learning framework to train a non-CoT proof model while synthesizing proofs for difficult problems through recursive proof based on subgoals. The non-CoT generation mode was chosen to accelerate iterative training and data collection processes.

Building on this, the second stage utilized cold start CoT data, generating them by combining DeepSeek-V3's complex mathematical reasoning patterns with synthesized formal proofs. The CoT mode was further enhanced during a subsequent reinforcement learning stage, following the standard training workflow typically used for reasoning models.

The non-CoT mode training process for DeepSeek-Prover-V2 follows the expert iteration paradigm, widely adopted in developing formal theorem provers. In each training iteration, the current best proof strategy is used to generate proof attempts for problems that were unresolved in the previous iteration. These successful attempts, after being verified by the Lean proof assistant, are incorporated into the SFT dataset to train an improved model. This iterative cycle ensures that the model learns from the initial demonstration dataset and refines its own successful reasoning trajectories, gradually improving its ability to solve harder problems. Overall, the training process is largely consistent with that of DeepSeek-Prover-V1, with only two modifications made to the distribution of training problems.

Firstly, Prover-V2 introduces additional problems from automated formalization and various open-source datasets, expanding the scope of covered training problems. Secondly, the new model expands the dataset by generating problems through subgoal decomposition, aiming to address more challenging instances within the effective segmentation of the MiniF2F benchmark.

Researchers conducted supervised fine-tuning on DeepSeek-V3-Base-671B using a constant learning rate of 5e-6 in a 16384-token context. The training corpus consists of two complementary sources: 1) non-CoT data collected through expert iteration, generating Lean code without intermediate reasoning steps; 2) cold start CoT data described in Section 2.2, distilling DeepSeek-V3's advanced mathematical reasoning process into structured proof paths. The non-CoT component emphasizes formal verification skills in the Lean theorem prover ecosystem, while CoT examples explicitly model the cognitive process of transforming mathematical intuition into formal proof structures.

Prover-V2 adopts the GRPO reinforcement learning algorithm. Unlike PPO, GRPO eliminates the need for a separate critic model by sampling a set of candidate proofs for each theorem prompt and optimizing the strategy based on their relative rewards. Training uses binary rewards: each generated Lean proof receives a reward of 1 if verified as correct, otherwise 0. To ensure effective learning, researchers carefully selected training prompts, including only those problems that are sufficiently challenging for the supervised fine-tuning model but still solvable. The model samples 256 different problems per iteration, generating 32 candidate proofs for each theorem, with a maximum sequence length of 32768 tokens.

Finally, there is the model distillation. Researchers expanded the maximum context length of DeepSeek-Prover-V1.5-Base-7B from 4096 tokens to 32768, and fine-tuned this extended context model using rollout data collected during the reinforcement learning phase of DeepSeek-Prover-V2-671B. In addition to the CoT reasoning mode, researchers also integrated non-CoT proof data collected during the expert iteration process to implement a cost-effective proof option capable of generating concise formal outputs with a smaller model size. Furthermore, the 7B model also adopted the same reinforcement learning phase as the 671B model to improve performance.

Researchers conducted a systematic evaluation of DeepSeek-Prover-V2 on various formal theorem proving benchmarks, covering high school competition questions and undergraduate-level math problems. Experiments show that the 671B version of the model achieves unprecedented accuracy and is also more efficient compared to other advanced models in the industry.

Comparison with the most advanced models on the miniF2F test dataset.

Problems solved by DeepSeek-Prover-V2-671B on the miniF2F benchmark.

ProofNet - Test and PutnamBench experimental results.

ProverBench: Formal Benchmark Dataset for AIME and Textbook Questions

This time, DeepSeek also released ProverBench, a benchmark dataset containing 325 questions. Among them, 15 questions come from number theory and algebra problems in the recent two sessions of the AIME math competition (AIME 24 and 25), which have been formalized and possess genuine high school competition difficulty. The remaining 310 questions are carefully selected from textbook examples and teaching tutorials, covering diverse content with a solid educational foundation.

ProverBench link: https://huggingface.co/datasets/deepseek-ai/DeepSeek-ProverBench

This dataset aims to support comprehensive evaluations of models across high school competition questions and undergraduate-level math problems.

Composition of the ProverBench dataset.

Online Comments: Simply Amazing

Based on the popularity of the new model, everyone is eagerly awaiting DeepSeek to change the world again. Many netizens expressed great appreciation for DeepSeek's newly open-sourced work.

Students who study math Olympiad also sent impressive exclamations (those who have done the problems know how deep the tricks go).

Netizens tested it personally and found the effect truly amazing, outperforming o4-mini and Grok-3.

Someone on social media noted that the method of decomposing and processing complex problems resembles the techniques taught to junior engineers, and the approach DeepSeek-Prover-V2 takes in handling mathematical problems should work equally well for coding issues.

However, people seem to have even greater enthusiasm for DeepSeek-R2! Tap on this little blue whale, when will R2 be released?

For more detailed content, please check the original link ~

Original article: https://www.toutiao.com/article/7499296225187971622/

Disclaimer: The views expressed in this article are solely those of the author. Feel free to express your opinions by clicking the "Like/Dislike" buttons below.