
The average interval for releasing a new basic large model by ten major large model companies in China and the United States is every 8.5 days, accelerating the global large model race.
Major players in global basic large models are mainly in China and the United States. Since early this year, core players from both countries have increasingly accelerated their release pace of new-generation large models - each generation is stronger than the last. In the competition of large models, the "model competition" remains certain.
At 4 a.m. on April 29, Alibaba released the Qwen3 series models, which are the strongest in Alibaba's large model series. It once again narrows the capability gap with top U.S. basic large models. After being released on GitHub, the Qwen3 series models gained 17,000 stars within four hours, setting a new record for the popularity of open-source large models.
The Qwen3 series models include two mixed expert (MoE) models and six dense (Dense) models, covering parameter scales ranging from 600 million, 1.7 billion, 4 billion, 8 billion, 14 billion, 32 billion, 300 billion, to 2350 billion full-size parameters.
Among them, Qwen3-235B-A22B is the most powerful in the Qwen3 series models, second only to the world's strongest large model - o3 under OpenAI. According to the technical documentation of Qwen3, its benchmark test scores are on par with those of DeepSeek-R1, o1 and o3-mini under OpenAI, and Gemini-2.5-Pro under Google.
Effective ways to enhance model performance include preparing more computing power or data. Although the technical documentation of Qwen3 does not disclose the scale of computing power used during training, it reveals the amount of data used.
The technical documentation of Qwen3 reveals that the amount of data used has significantly increased compared to the previous generation. Qwen2.5 was pre-trained on 18 trillion tokens (units of large model computing power, each character is a token), but Qwen3 used nearly double that amount, reaching approximately 36 trillion tokens, even covering 119 languages and dialects.
There has been a consensus in the global large model industry since the second half of 2024 that the "Scaling Law" (the model's performance is determined by computing power, model size, and data scale) is slowing down. Simply stacking computing power can no longer significantly improve model performance. However, none of the major companies have abandoned training basic models as a result; they continue to explore new methods to enhance model performance.
From January to late April this year, including major Chinese large model companies such as Alibaba, Tencent, ByteDance, Baidu, and DeepSeek, as well as major U.S. large model companies such as OpenAI, Anthropic (an AI startup invested by Amazon), xAI (an AI startup under Tesla founder Elon Musk), Google, and Meta, all released new basic large models.
In the first 119 days of 2025, ten major large model companies in China and the United States released or updated 14 basic large models. On average, a new basic large model is updated every 8.5 days. The arms race in large models is still accelerating.

From 2023 to 2024, there was an extreme improvement in the performance of global large models. From the second half of 2024, the pace slowed somewhat, but the intensity of competition among various parties did not decrease, and the competition for basic model capabilities remained intense.
Because model capabilities remain the core factor determining customer scale.
A technology company algorithm head told us in January this year that large models are long-distance races, with significant upgrades every 3-6 months. Long-term iteration and maintaining performance leadership are crucial. Once model capabilities fall behind, competitors will seize customers. This is why almost every large technology company is still training its next-generation large model, regardless of whether it requires ten thousand cards or a hundred thousand cards, the training never stops.
DeepSeek, a Chinese large model start-up, is the catalyst speeding up the race. DeepSeek is forcing big tech companies to regain urgency. In February, a leading figure in China's large model industry described it as follows: "The big tech companies were running at a leisurely pace. Now, a黑马 has suddenly appeared on the track, and the格局 has completely changed." His team quickly referenced and drew inspiration from DeepSeek-R1 and launched their self-developed inference large model at the end of February. He admitted that this model was "pushed out," with the entire training cycle lasting less than two weeks and even without repeated testing before going online.
In this model competition, China is currently closely following American companies. A market research report by Artificial Analysis published in late January this year showed that American large models still lead in performance, but China is catching up and narrowing the gap. Among the world's top 21 models, six belong to Chinese companies, including two from Alibaba.
In 2025, besides competing on performance, another direction of competition is reducing computing power costs.
Xu Dong, general manager of Alibaba Cloud's Tongyi Large Model business, said on April 9 that one of the main trends in China's large model development in 2025 is still improving accuracy while reducing computing power costs. There is much work to be done in engineering. What Chinese customers want most are models that are multi-functional, fast, good, and cost-effective.
For example, the latest Qwen3 family models released by Alibaba have a total of eight versions. The model parameters (which can be understood as model size, generally the larger the parameter, the stronger the performance and higher the precision) cover different ranges such as 600 million, 1.7 billion, 4 billion, 8 billion, 14 billion, 32 billion, 300 billion, and 2350 billion. The broad coverage of model parameters means they can apply to different businesses and meet various needs. Smaller-sized models can save computing power costs while meeting basic requirements, whereas larger-sized models are suitable for users pursuing ultimate performance.
The deployment cost of Qwen3 models has been significantly reduced. The flagship model Qwen3-235B-A22B, despite having a massive 2350 billion parameters, uses the MoE (mixed expert model) architecture, dispatching questions to different expert models when answering. Therefore, only 220 billion parameters are activated per call. Its required computing power is also greatly reduced.
Alibaba claims that using Qwen3-235B-A22B roughly requires four NVIDIA H20 AI chips. It consumes only 25%-35% of the computing power of DeepSeek-R1's flagship 671B version, reducing model deployment costs by 60%.

Why must computing power costs be reduced? The logic is simple.
Firstly, in 2025, the focus of large model competition shifts from training to inference, making low-cost, high-performance models more important.
Secondly, AI applications are exploding, and the long-established internet application landscape may see new opportunities.
For large companies, there are more opportunities now than two years ago when large models first emerged, but the competition has become more difficult.
Two years ago, they just needed to prepare ten thousand computing cards, train a model with a trillion parameters, and then watch others "compete" in applications. Now, they need to buy one hundred thousand computing cards, train good and inexpensive models, and simultaneously explore AI applications in both B2B (business-to-business) and B2C (business-to-consumer) directions.
However, due to continuously improving model performance and decreasing costs, its business model is gradually taking shape.
A strategic planning person from a technology company revealed that in 2024, the model call revenue of various model vendors in China generally reached only tens of millions or hundreds of millions. These revenues are negligible. However, the increase in computing power consumption from model calls and public cloud income growth (including computing, storage, network, and database) are more impressive. Therefore, vendors like Alibaba Cloud are willing to stimulate market demand through free open-source approaches.
According to IDC data, the daily average usage of large models in China reached 952.2 billion times in December 2024, while in June 2024, this number was only 9.63 billion times. That is to say, from June to December 2024, the daily average usage of large models in China increased nearly tenfold.
The growth rate of model calls for large companies like Alibaba and ByteDance is faster. The Financial Times learned that the daily average Token usage growth of Alibaba and ByteDance over the past year has exceeded 100 times. Alibaba expects that the growth in model calls for Tongyi models in 2025 will increase by several dozen times. This means that the scale of revenue from this part of Alibaba and ByteDance will far exceed that of 2024.
A strategic planning person from a leading Chinese technology company analyzed that for example, DouBao, a large model under ByteDance, has a daily average token usage of 12 trillion. If the model call price remains unchanged and calculated at an average price of 0.8 yuan per million tokens, the monthly income would be about 288 million yuan, with the annual income potentially around 3 billion yuan. However, this is a static and rough calculation method. Because with the dozens-of-times growth in model usage, the model price may also decline at a tenfold speed.
Another strategic planning person from a leading Chinese technology company believes that currently, the competition between Alibaba and ByteDance in the cloud computing arena is the fiercest. ByteDance's cloud service, Volcano Engine, is fully committed to artificial intelligence, unwilling to use price wars to seize Alibaba Cloud's market share. Due to the lack of cost control, Volcano Engine is currently temporarily losing money.
A salesperson from a leading cloud computing company told The Financial Times in January this year that Volcano Engine even used a 20%-30% discount to poach clients from Alibaba Cloud. A document obtained by The Financial Times shows that Volcano Engine expects its revenue to exceed 20 billion yuan in 2025, with expected revenue growth far exceeding 60%.
However, Alibaba Cloud's advantage lies in its larger revenue and profit scale, giving it enough capital reserves to handle competition and enabling it to enter a positive cycle. In 2024, Alibaba Cloud's revenue was 113.5 billion yuan, growing by 7.3%; EBITA (Alibaba Cloud typically uses EBITA profit as its profitability indicator, excluding non-cash factors such as equity incentives and amortization of intangible assets) profit was 9.6 billion yuan, with a profit margin of 8.4%.
Alibaba Cloud is also one of the biggest beneficiaries of large models. Driven by large models, Alibaba Cloud's revenue growth and profit levels continued to rise in 2024. In the fourth quarter of 2024, Alibaba Cloud's revenue was 31.74 billion yuan, with growth rebounding to double digits, reaching 13.1%; EBITA profit was 3.14 billion yuan, with a profit margin of 9.9%, reaching historical highs.
Editor-in-Chief | Zuo Zhuo
Original source: https://www.toutiao.com/article/7498560009903587880/
Disclaimer: The views expressed in this article are solely those of the author. Welcome to express your opinions by clicking the 'Agree' or 'Disagree' button below.