【By Chen Jing, Observer Columnist】

The US-China chip battle has taken an unexpected turn.

On July 15, 2025, Huang Renxun, CEO of NVIDIA, told a CCTV journalist: “The US government has approved our export license, so we can start shipping. Therefore, we will begin selling H20 to the Chinese market. I am very excited and looking forward to a quick shipment, which is truly great news.”

On the same day, an AMD spokesperson said that the US Department of Commerce informed the company that the application for MI308 product licenses would enter the review process. Like H20, MI308 is an AI chip specifically designed for the Chinese market by AMD.

Earlier in early April, the Trump administration suddenly halted the sale of H20, causing NVIDIA problems. Huang Renxun intervened in politics unusually, traveling around the world actively negotiating, and visiting China three times in 2025 to stabilize relations with the Chinese government and customers.

On the other hand, Huang Renxun changed his style and actively participated in American political activities, trying to persuade the Trump administration to change its policy, accompanying Trump on a visit to the Middle East, testifying before Congress, and building good relations with the White House.

On July 23, at the AI summit held in Washington, Huang Renxun answered a host's question, saying: "The United States has a unique advantage that no other country can match, that is President Trump." He then explained that as soon as Trump took office, he recognized the importance of AI and energy, and introduced a series of policies such as tax cuts, relaxed regulation, and an AI action plan, creating an "irreproducible institutional dividend" for American companies in global competition. Trump himself responded on site: "You did a great job!" A tech worker's political work reached this level, which is impressive.

Another key factor is China's own technological capabilities. On July 15, US Treasury Secretary Janet Yellen said on Bloomberg Television that China had already developed chips with performance comparable to H20, so there was no problem with NVIDIA selling H20 chips. This is the first time the US government has publicly acknowledged that China has the capability to replace similar AI chips.

A more direct explanation came from David Sacks, the White House "AI czar". On December 5, 2024, Trump announced that Sacks would be appointed as the newly established "White House Commissioner for Artificial Intelligence and Cryptocurrency Affairs". On July 16, 2025, Sacks told Bloomberg that resuming the sale of NVIDIA H20 chips to mainland China was not a concession, but a precise way to contain Huawei. Sacks said:

"There are sufficient reasons for this: you don't want to hand over the entire Chinese market to Huawei. Even a weakened chip, NVIDIA can capture a large share in China, thereby squeezing Huawei's space."

The political activities related to the H20 exemption are not difficult to see, but they are not the key. The GPU technology and market background pointed out by Bensin and Sacks are the key to persuading Trump, which requires careful explanation.

Domestic GPU Market and Technical Progress

Those who are not familiar with chip industry data would find it hard to imagine how big the domestic GPU market is.

The above figure shows the estimated revenue of Huawei's Ascend GPU from foreign institutions. According to the estimation of domestic advanced chip production capacity, the figures 910B, 910C, and 910X represent Ascend GPU models with gradually improving performance. The unit "kwpm" refers to "thousand wafers per month," while "Die per wafer" indicates the number of "die" on each wafer. The larger the single chip area, the smaller the number. It can be seen that the area of 910C and 910X is about twice that of 910B, and even 910B on a 12-inch wafer (diameter 300 mm, area about 700 square centimeters) only has 78 die, indicating that GPU chips have relatively large areas.

"Yield rate" refers to the yield rate. Due to the complex multiple exposure process required for 7nm advanced chips using DUV lithography machines, foreign media estimates the initial yield rate to be only 15%, gradually increasing to 50%, and up to 70%. The initial prices of 910B, 910C, and 910X are estimated at 50,000, 110,000, and 140,000 yuan respectively, and will decrease after supply comes online.

This estimate is rough, but the results are extremely surprising. In 2024, Huawei's GPU chip revenue was 42.947 billion yuan, and it is expected to increase by 240% to 146.107 billion in 2025, and further increase by 45% to 212.023 billion in 2026, reaching 286.567 billion in 2027. In 2024, Huawei's revenue was 862.1 billion, with a net profit of 62.6 billion. If the high-profit GPU can achieve over 200 billion in revenue, it will greatly support Huawei's R&D capabilities. This estimate is definitely inaccurate, but it can illustrate the scale and profit of the Chinese computing GPU market.

At the press conference on May 21 in Taipei, Huang Renxun said, "NVIDIA's market share in China dropped from 95% to 50%." In fact, in 2024, the Chinese market accounted for only 12.5% of NVIDIA's global revenue, amounting to about 17 billion dollars, which is not particularly high, because more advanced GPUs than H20 cannot be sold. Although sales in the Chinese market were restricted, it did not affect NVIDIA's rapid expansion of global revenue. However, Huang Renxun was worried that after 2025, the Chinese computing GPU market would grow significantly, and the market share given up by NVIDIA would become a "growth accelerator" for Chinese competitors.

NVIDIA's computing GPU has almost no competitors internationally. In 2024, its data center GPU revenue was about 115 billion dollars, with a market share of 93%-94%. The only notable competitor, AMD, has a market share of about 4%, and other competitors like Intel account for less than 3% in total.

This extremely rare situation has been explained by the author in the article "No one can 'kill' a 1.5 trillion dollar NVIDIA" in early 2024. Originally a supporting role, GPU relies on general computing functions and has reversed the position with CPU, becoming the absolute main force of chip computing power. NVIDIA's spirit of serving customers is extremely thorough, actively optimizing in numerous fields such as neural networks, scientific computing, game development, cloud computing, AIGC, and large language models, making CUDA become the de facto "GPU operating system" without any remarkable competitors.

Developing software related to computing GPUs is extremely complicated. The difficulty of "ten thousand card interconnection" involving software and hardware development is daunting, requiring optimization of many aspects. Any slight defect in these aspects can cause clients' development processes to stall or crash. Once problems are difficult to solve, the result is "even if given for free, no one wants them," which is the awkward situation AMD faces.

Only NVIDIA has achieved excellent performance in many aspects, including GPU chip design, GPU system architecture, HBM memory management, NVLink/NVSwitch high-speed data transmission, InfiniBand connecting GPU servers, and CUDA software optimizing hardware performance. These allow clients to easily use NVIDIA GPUs for cutting-edge research and applications in large models.

NVIDIA's data center GPU products have a gross margin of 73% and a market share of over 90%, all related to these extremely difficult technologies. Competitors may have indicators exceeding NVIDIA in certain specific applications, as AMD often claims, but "practical" applications have a lot of problems. The field is still developing at a "super Moore's Law" pace, and NVIDIA continues to launch new GPU architectures such as Blackwell and Rubin, with over ten thousand employees optimizing the CUDA system for various professional applications like robotics and lithography. The soft and hardware advantages are very solid. Even though AMD and other chip industry competitors have strong capabilities in chip design, as Huang Renxun said, NVIDIA has transformed from a chip design company into a "software company," and is no longer on the same level.

What shocked the outside world is that in the highly complex computing GPU field, China, under the most severe U.S. restrictions, has initially developed the ability to compete with NVIDIA in both software and hardware systems. Recently, the landmark achievement is Huawei's CloudMatrix 384 super node based on Ascend 910C, which directly matches the performance of NVIDIA's GB200 NVL72 system.

On April 16, 2025, the semiconductor and artificial intelligence analysis institution SemiAnalysis released a report detailing Huawei's CloudMatrix and 910C. From it, it can be seen that although China's advanced chip manufacturing technology is restricted, it has caught up with the performance gap of GPU chips through comprehensive solutions. The complex technical issues that many Western companies find hard to catch up with NVIDIA are opportunities for Chinese companies, as complexity provides multiple routes for optimization.

The figure shows the GB200 NVL72 cabinet, slightly taller than an adult, and the volume is not too large. In terms of single-chip performance, 910C has some gaps compared to GB200. The computing power is measured in BF16 dense computing power (sparse computing power is higher but unstable), GB200 reaches 2500 TFLOPS (TFLOPS stands for "trillion floating-point operations per second"), while 910C is 780 TFLOPS, roughly one-third of the computing power.

The B100 chip in GB200 is manufactured using TSMC's 3nm process. The B100 has 10.4 billion transistors, and the B200 is two B100 dies in advanced packaging, totaling 20.8 billion transistors. Plus differences in HBM, the single-card performance of 910C is significantly lower than that of GB200. In addition to the difference in computing power and transmission speed, there is also a difference in chip power consumption caused by 7nm and 3nm processes.

However, Huawei's CloudMatrix 384 super node composed of 384 910C chips (hereinafter referred to as CM384) has a BF16 computing power of 300 PFLOPS (PFLOPS equals 1000 TFLOPS), approximately 1.7 times that of GB200 NVL72's 180 PFLOPS. The obvious cost is that the system power consumption is about 4.1 times that of NVL72, and the power consumption per TFLOPS is 2.5 times. It also requires 3.6 times the total memory capacity and 2.1 times the Scale Up Bandwidth total bandwidth.

In terms of physical size, CM384 is much larger than the GB200 NVL72 cabinet. It has 16 racks, 12 for computing, and 4 for data exchange. A simple visual inspection shows that the NVL72 cabinet is about 16 times larger in size and floor space. It needs five times the number of chips, 10 times more volume and floor space, to make up for the gap in single-chip performance.

But achieving superior computing power is a major achievement. For this, the CM384 architecture needs some "big moves." Interestingly, once a solution is found, the performance gap between 7nm and 3nm chips becomes less critical in the entire GPU computing system. For example, Huawei's expertise in data communication technology, "optical interconnection," has played an important role.

SemiAnalysis believes that Huawei's engineering advantage lies in the system level, not just the chip level, and includes innovations in topological networks, optical interconnection technology, and software stacks. Overall, the defects of Huawei's technical solution are high costs and four times higher energy consumption, although the total computing power is high, the comprehensive performance indicators are clearly inferior to NVIDIA's NVL72. Under normal circumstances, the market would not accept such a solution.

But the current situation is extremely abnormal. The biggest abnormality is that NVIDIA's AI computing products are not sold to China, yet the domestic system can be used and has value. Another abnormality is that NVIDIA's GPU products have extremely high profit margins, so Huawei's high-cost, high-energy-consuming matching products are not a problem.

Significance of H20 Exemption

I once visited the H20 server production line. The main cost is an 8×H20 box (NVIDIA sells it together with 8 cards), plus motherboard, CPU, NVLink interconnection, and 4×400 GbE network cards, assembling into a complete server. Its total computing power is BF16 1184 TFLOPS, which is not much higher than that of a single 910C, and is not suitable for base model training.

H20 is in good demand in China, an important reason being DeepSeek driving the deployment of inference servers and vertical model training. The price of a single H20 card is around 100,000 yuan, and the price of an eight-card server is between 1.1 million and 1.3 million yuan, with high profits. Chinese large model R&D applications still tend to use the CUDA ecosystem, and H20-based inference applications are relatively mature.

The characteristics of H20 are that its computing power is less than one-tenth of H100, but its HBM capacity is 96GB and 141GB, and its HBM bandwidth is 4.0TB/s, which is close to H100. During large model inference, matrix calculations are not as dense as during training, and data transmission between GPUs is more frequent. H20 performs well in data transmission. Several leading internet companies have placed considerable orders, having booked 16 billion USD of H20 in 2024.

The performance indicators of H20 are not very strong, and domestic GPUs have the ability to approach it. In addition to Huawei Ascend, there are companies like Moer Line Technology, Bi Ren Technology, Mu Xi, and Tian Shu Zhi Xin, among others, working on computing GPUs. These top GPU companies have all started IPO or shell acquisition procedures, pushing forward simultaneously on the Sci-Tech Innovation Board and Hong Kong Stock Exchange, with a significantly accelerated listing schedule. Other companies with performance-enhancing GPU products also exist.

Due to U.S. sanctions, even if Chinese enterprises think the NVIDIA CUDA ecosystem is convenient to use, they are still trying to explore other platforms.

Starfly's large model and Huawei have worked together, adhering to a fully independent technical route of software and hardware. The Starfly large model training and inference are all run on Huawei's 910B/910C national proprietary 10,000-card cluster "Feixing One"; the second phase "Feixing Two" is expected to be delivered by the end of 2025, capable of supporting continuous training of trillion-parameter models.

The Starfly large model development team, due to its special technical route, requires a lot of effort to adapt, and Huawei also sends many people to maintain and develop GPU operations, which is very arduous. After years of effort, the performance of the Starfly large model has now caught up, and it has collaborated with major state-owned enterprises such as CNPC and CNOOC in various vertical models, with special strategic value. The ecosystem will continue to expand.

Listed company Cambric's MLU (Machine Learning Unit) cloud intelligent chips are essentially GPUs, and currently have application achievements second only to the Ascend series. Cambric's Simulink 590 can support the thousand-card-level training of mainstream open-source large models such as DeepSeek-R1 671B, Llama-3, and Qwen-3, with an actual computing power density reaching 80% of A100. Simulink 690 is reserved for advanced chip production capacity at Semiconductor Manufacturing International Corporation. Cambric's "hardware-software-ecosystem" system has initially taken shape, and it collaborates with domestic mainstream large model developers, aiming to form a domestic computing power matrix covering training and inference by 2025-2026. The technical prospects support Cambric's 280 billion yuan market value, maintaining over half a year, not a typical speculative topic, and turned profitable in the first quarter of 2025.

Recently, the development momentum of domestic large models has been very good. After DeepSeek broke through in early 2025, it opened up several key technologies, promoting the industry's great development. Currently, Alibaba's Qwen series, Kimi K2, and other domestic large models have performed well on the open-source ranking list, attracting global attention. ByteDance and Tencent have invested heavily in large models, with rich application scenarios and rapidly growing strength.

Compared to half a year ago, the domestic AI software and hardware ecosystem has completely changed, with significant growth in strength and numerous breakthroughs. The ecosystem has emerged, and technical confidence has risen, so the U.S. AI restrictions are no longer as worrying.

All of this was unimaginable before, and the driving force behind the great development is undoubtedly the U.S. chip sanctions. The domestic chip industry's replacement and ecological construction have entered a period of prosperity. Due to the complexity of GPU technology and the diversity of application scenarios, there are many system-level solutions that can be considered, and the dependence on EUV lithography machines is not serious, so CM384 can match the total computing power of NVL72. Multiple domestic GPUs have their own strengths, and they are a focus of investment by IT and internet companies, representing a relatively active field of chip innovation.

The current U.S. sanctions are very subtle, and the global industry has already made judgments, willing to get along well with China. Even the U.S. government itself understands that China has weathered the chip sanctions, and the huge market demand for domestic GPU chips has appeared. Meanwhile, domestic large models have significantly narrowed the gap with the U.S., and several companies have at least the ability to follow technologically. From the perspective of restricting China's chip and artificial intelligence technology development, the U.S. sanctions have failed, and China has made significant progress that cannot be restricted.

Since the U.S. launched a trade war against China in 2018, it can be concluded that U.S. technological sanctions have had a positive impact on China's technological development; the chip industry has experienced a miracle of development, even reaching the point where China has its own ecosystem for the most advanced GPU chips.

The exemption of H20 indicates that the U.S. side acknowledges the achievements of China's autonomous development in software and hardware in the AI field, and recognizes the negative effects of sanctions. The U.S. side has adjusted its goals, no longer insisting on completely blocking China, but instead changing to "capturing the global AI market."

This is essentially realizing that China is a competitor, and the market needs to be competed for. Huang Renxun's commercial activities in China, the Middle East, and other regions are very helpful for the U.S. government's global AI strategy, so his voice has increased, and the exemption of H20 is also understandable.

The change in the U.S. government's attitude indicates that the U.S.-China chip struggle has entered a new stage. The U.S. government is willing to look at issues with a slightly more normal market logic, no longer blindly block and foolishly give up valuable markets, but rather hope to use H20 to capture the market share of Chinese companies.

As for whether the exemption of H20 will harm the market prospects of domestic chips, after seeing the technical analysis above, it can be understood that products like Huawei's CloudMatrix 384 that can match the most advanced servers of NVIDIA far exceed the technical indicators of H20 and are not on the same level of competition. H20 can supplement China's large model inference computing power and vertical model training computing power, which is beneficial for the promotion of large model applications, and the exemption is not bad.

In the broader AI training and inference application market, the domestic GPU market demand will inevitably grow rapidly as the self-reliant ecosystem gradually improves. If the U.S. seeks to disrupt the Chinese market ecosystem and again exempts more powerful GPUs, the trend of victory or defeat in the U.S.-China technology struggle will be even more obvious.

This article is an exclusive article from Observer, the content is purely the author's personal opinion, does not represent the platform's view, and unauthorized reproduction will be pursued for legal responsibility. Follow Observer WeChat guanchacn to read interesting articles every day.

Original: https://www.toutiao.com/article/7531998516546208302/

Statement: This article represents the views of the author, and you are welcome to express your attitude by clicking on the 【Top/Down】 button below.