【By Observer Net, Wang Yi】DeepSeek amazed the world at the beginning of this year with its high cost-performance ratio, high performance, and open-source driven characteristics. Recently, its team published a paper in the British magazine "Nature", for the first time revealing that the training cost of the DeepSeek-R1 model is only 294,000 US dollars, and building a basic large language model cost about 6 million US dollars. This cost has far exceeded the figures disclosed by American counterparts, and that was just the approximate cost disclosed by American companies.
The British Reuters on September 18 pointed out that DeepSeek's cost is much lower than the figures previously disclosed by American competitors, and this information may once again spark discussions around China's position in the global artificial intelligence (AI) field. The Indian news website "Devdiscourse" also stated on the 19th that DeepSeek provided cost data for the first time, which sparked doubts from American companies about their own strategies.
American media such as CNN and Bloomberg were surprised by DeepSeek's 294,000 US dollar training cost on the 19th. The American Consumer News and Business Channel (CNBC) commented that considering how much OpenAI spent, DeepSeek's cost is simply "astonishing". Their model has overturned the assumption that only countries with the most advanced and fastest chips can dominate in the AI competition, and now they have quantified it with numbers.
On the 17th, the magazine "Nature" published a research paper on the DeepSeek-R1 inference model co-authored by the DeepSeek team, with Liang Wenfeng as the corresponding author. Compared to the initial paper released in January this year when DeepSeek-R1 was launched, the latest paper revealed more details about model training, marking the model as the world's first mainstream large language model that has undergone peer review.
The latest paper disclosed that the DeepSeek-R1 model used 512 NVIDIA H800 chips, and the training cost was only 294,000 US dollars.
Reuters said that the early paper from January this year did not include this information. The training cost of large language models usually refers to the huge expenses incurred by using high-performance chip clusters to process massive texts and codes. In 2023, OpenAI's CEO Sam Altman mentioned that the cost of training a basic model was "far more than" 100 million US dollars, but his company has never publicly disclosed specific data.

The cover of the current issue of "Nature"
The paper also responded to some baseless criticisms made by American officials earlier. To suppress the development of Chinese AI, the U.S. government banned NVIDIA from exporting advanced H100 and A100 chips to China in 2022. After the release of DeepSeek's AI large model, U.S. officials did not believe that Chinese companies could train such a high-performance AI model using the "cut-down" H800 chips designed for the Chinese market by NVIDIA.
Therefore, in June this year, U.S. officials accused DeepSeek of illegally obtaining "a large amount" of H100 chips after the U.S. export controls and using them for large model training. NVIDIA later responded that DeepSeek used legally purchased H800 chips, not H100 chips.
In the supplementary materials of the "Nature" paper, DeepSeek admitted for the first time that it indeed had A100 chips, stating that it had used these chips for "preparation for smaller model experiments" during the early development phase, but after that, the R1 model was trained on a cluster of 512 H800 chips for a total of 80 hours.
DeepSeek also indirectly responded to the accusations from White House senior advisors and some U.S. AI industry insiders in January this year, who claimed that DeepSeek "illegally replicated" the functions of OpenAI products through "distillation" technology and claimed to have found "evidence". However, these so-called "evidence" were never publicly disclosed afterward.
The core theory of distillation is to let a large and complex pre-trained AI model act as a "teacher" to train a smaller "student model", which learns knowledge from the "teacher model" to achieve similar performance, but with lower computational costs. Many experts said that distillation is a common practice in the AI industry, but if it involves directly copying the output structure or parameters of a closed-source proprietary model, it may constitute infringement.
DeepSeek has always defended the distillation technology, arguing that this method not only improves model performance, but also significantly reduces training and operating costs, thereby expanding the accessibility of AI technology. In January this year, the company mentioned that they used Meta's open-source AI model Llama to build a simplified version of their model.
In the paper published on September 17, DeepSeek stated that the training data of its V3 model came from web crawling, which included "a large number of answers generated by OpenAI models, which may lead the basic model to indirectly gain knowledge from other powerful models". However, DeepSeek emphasized that this was not intentional, but an accidental result.
Lewis Tunstall, a machine learning engineer at Hugging Face who reviewed the paper, thought DeepSeek's explanation was reasonable. Later, other laboratories successfully replicated the effect of R1 model using similar methods, indicating that other AI models can achieve extremely high reasoning capabilities without so-called secret data from OpenAI.
The tech consulting website "Tech Space 2.0" also analyzed that DeepSeek's data strategy is to use the maximum amount of free data for pre-training, and cleverly use self-generated data for fine-tuning, spending only on computation. This frugal strategy is a template that other companies are currently deeply studying.
The website pointed out that DeepSeek-R1 stands out among similar products because it achieved the most advanced results at an extremely low cost. While OpenAI's GPT-4 and Google's AI model "Gemini" still lead in certain aspects and enjoy strong corporate support, R1 has realized "democratization of high-end AI" in a way never seen before - open, relatively low replication cost, and high emphasis on efficiency. Models from Meta's Llama2 and French tech startup Mistral AI also follow an open philosophy, but R1 has taken this philosophy to a new level by achieving top performance.
"Tech Space 2.0" concluded: "These comparisons emphasize a key point: AI competition is no longer just about who has the most graphics processing units (GPUs), but also about who can achieve more goals with fewer resources. From this perspective, DeepSeek has changed the rules of the game."
This article is an exclusive article from Observer Net. Reproduction without permission is prohibited.
Original: https://www.toutiao.com/article/7551758729360736808/
Statement: The article represents the views of the author. Please express your opinion by clicking the [Up/Down] buttons below.