In the thriving era of artificial intelligence, the capabilities of large-scale reasoning models continue to break through, attracting widespread attention from academia and industry. Recently, a research achievement from the team led by Dr. Chen Kun, a researcher at the Institute of Theoretical Physics, Chinese Academy of Sciences, and collaborators has revealed the secrets behind the autonomous learning of large-scale reasoning models like DeepSeek - R1. The "critical learning" theoretical framework they proposed shows great potential in basic scientific research, particularly in helping solve core calculations in quantum field theory, opening up a new path for the integration of artificial intelligence and scientific research.

An Inspiration from Physical Phase Transitions

In January this year, the release of DeepSeek's reasoning model, DeepSeek - R1, caught global attention. Dr. Chen Kun's team keenly noticed the scientific value behind the phenomenon of this model spontaneously forming a reasoning thinking pattern. As a team long focused on multi-electron field theory research, they attempted to analyze this process from the theoretical framework of statistical physics.

The research team published their work titled "Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond" on the preprint website arXiv.

During their analysis of the model's reasoning patterns, researchers proposed a hypothesis: there might be a simple and universal physical model behind DeepSeek. By observing the spontaneous emergence process when the model learns a single problem, they found that it exhibited typical critical physical phenomena characteristics, similar to the phase transition between water and steam. For example, using the test of adding multiple digits in base-7 (7-digit numbers), the untrained Qwen2.5 - 7B model initially could not solve the problem, but after single-sample reinforcement learning training, its accuracy significantly improved, with the learning curve showing a phase transition behavior after a certain number of training steps. Further research discovered that near the phase transition point, although the model did not reach peak accuracy on the training samples, it demonstrated the strongest generalization ability on other multi-digit addition problems. At this point, the model achieved optimal performance balance, capable of flexible exploration as well as extracting underlying rules; excessive training would make the model's thinking rigid.

Figure | The reasoning process when the large model answers the question "12 + 98 = ?" (Source: Chen Kun)

Based on this, the team constructed a unique theoretical framework. During the reasoning process of large models answering questions, some uncertain token positions were defined as "decision points," while deterministic token sequences were abstracted as "concepts" (Concept). The association between decision points formed a "concept network" (CoNet), modeling the decision space in the model's thinking process. In this framework, the abstract reasoning process in the long thought chain of large models corresponds to random walks in the concept network. The model starts from the problem, explores network paths, and arrives at the answer. DeepSeek's GRPO reinforcement learning algorithm and its variants regulate path probabilities, placing the network in an intermediate state. When trained with a single learning sample, this intermediate state is near the continuous phase transition point, exhibiting critical behavior, such as power-law distributed thinking path lengths, which endows the model with efficiency and flexibility, laying the physical foundation for the "critical thinking mode." Thus, the "critical learning" (LaC, Learning at Criticality) theoretical framework was born.

Figure | Left: Random walk in the large model's thinking process; Right: Minimized Policy model (Source: Chen Kun)

Breakthrough Results in Core Calculations of Quantum Field Theory

Quantum field theory is an important theoretical discipline in physics, and its core calculations, such as high-order Feynman diagram computations, are extremely complex. Traditionally, it takes six months to a year for doctoral students to master two-loop Feynman diagram analytic calculation methods in cutting-edge field theory problems, while three-loop diagram calculations are almost impossible for humans. Historically, the analytical computation of three-loop scattering diagrams in quantum electrodynamics took the academic community decades.

Dr. Chen Kun's team chose to verify the effectiveness of LaC on typical theoretical physics problems — calculating high-order Feynman diagrams of different loops. They used the Qwen3 - 8B model with 8 billion parameters, training it in stages to solve the summation problem of loose frequencies. To everyone's surprise, after being trained at the critical point using low-order diagram examples, the model successfully derived unseen high-order diagram solutions, outperforming benchmark models with parameter quantities two orders of magnitude higher. Data shows that models trained with LaC achieved accuracies of 97.5% and 56.9% on 1-loop and 2-loop diagrams, respectively, and generalized to 3-loop and 4-loop problems, whereas the untrained benchmark models performed poorly. This result indicates that the LaC method enables models to achieve complex long-chain reasoning learning under data scarcity conditions, bringing new hope for solving core computational problems in quantum field theory.

Figure | Reinforcement learning dynamics in simplified models (Source: Chen Kun)

Breaking Traditional AI Limitations and Reshaping Research Paradigms

Traditional AI methods rely on large amounts of diverse data to maintain generalization capabilities, whereas the LaC method simulates the research paradigm of human experts deeply focusing on a single complex problem in a specialized field. This innovative approach breaks through three major limitations of traditional AI in fundamental scientific research.

First, it addresses the issue of learning efficiency under data scarcity. In fundamental scientific research, data is often difficult to obtain. LaC adjusts the large model parameters to the critical state, enabling the model to achieve optimal generalization performance from a very small amount of training data. Second, it overcomes obstacles in acquiring highly specialized knowledge. For example, in chemical research, the deep knowledge accumulated in specific laboratories often exceeds the scope of general large models. LaC allows the model to focus on solving core problems in the field, mastering deep professional knowledge. Finally, it achieves deep professional learning in small-sample situations, which is significant for data-scarce fundamental scientific research.

Looking ahead, the LaC theory holds promise for further refinement, not only optimizing AI reasoning capabilities but also providing theoretical tools for understanding the emergence mechanisms of complex reasoning abilities in large models. In scientific research, it will give rise to new research paradigms, helping AI transition from an auxiliary tool to an "intelligent agent" that autonomously explores scientific problems, truly achieving AI for Fundamental Science, and driving more breakthrough progress in the field of basic sciences, ushering in a new era of deep integration between artificial intelligence and scientific research.

References:

1. https://arxiv.org/abs/2506.03703

Original article: https://www.toutiao.com/article/7517931670524232226/

Disclaimer: The views expressed in this article are solely those of the author. Please express your opinions by clicking the "Agree/Disagree" buttons below.