By | Sun Yongjie
After rounds of market rumors and emotional reversals, the U.S. government eventually imposed export controls on NVIDIA's H20 chips, prompting NVIDIA CEO Jensen Huang to visit China again after three months, expressing his hope to continue cooperation with China. This move has caused significant震动in the industry. With the restriction of H20 chips in the Chinese market, the real exam for domestic AI chip replacement is officially underway.
NVIDIA H20 Restricted, Domestic Manufacturers Face Big Test Opportunities
Talking about the export control of H20 chips, NVIDIA recently released an 8-K filing stating that on April 9, the U.S. government informed them that a license was required for exporting H20 chips to China, and on April 14, they were told these regulations would be indefinitely implemented. The U.S. including H20 in the "non-civilian high-performance computing risk list" means that AI chip regulation has extended from high-end products (such as A100, H100) to customized mid-range products. It should be noted that H20 is the main chip that NVIDIA legally sells in China, which was launched after the latest round of U.S. export restrictions took effect in October 2023.

Almost simultaneously, the U.S. Commerce Department announced that AMD MI308 and similar AI chips also have new export licensing requirements for China. It seems that Intel has not received any exemptions either; reportedly, the company also needs to obtain an export license to sell its Gaudi chips to China.
Regarding this, Huatai Securities pointed out that the limitation of H20 sales may have been expected by the market, but the new regulations may block memory-based performance loopholes. Wanlian Securities, on the other hand, believes that the U.S. government's license management of H20 indicates increased trade restrictions, suggesting that the sale of H20 in the Chinese market may face significant limitations, leading to some loss of NVIDIA's market share in China, and domestic AI chip manufacturers are likely to take over more market share. The institution further pointed out that tariff negotiations are still uncertain, and global trade frictions may intensify, accelerating the process of semiconductor industrial localization, which will bring development opportunities for domestic computing power.
In our view, with the restriction of NVIDIA H20, AMD MI308, and similar AI chips, as well as Intel's Gaudi chips in the Chinese market, the opportunity for domestic AI chips to face the big test of substitution has truly arrived, meaning that domestic manufacturers now have unprecedented market space to verify the performance, reliability, ecosystem compatibility, and supply chain stability of their products.
Rise of Local Forces, Hidden Concerns under Huawei Ascend's Leadership
When it comes to the opportunity for replacing NVIDIA GPUs, Huawei's Ascend series chips are undoubtedly the most prominent, loudest, and farthest-reaching local alternative option. Especially the latest generation represented by Ascend 910C is becoming the core of building China's indigenous AI infrastructure.
More importantly, Huawei has extended its chip capabilities to the system level, aggregating computing power through systems like CloudMatrix (for example, the widely reported CM384 system composed of 384 Ascend 910C chips with full all-to-all interconnect topology). Its super nodes have reached scale and inference performance comparable to NVIDIA's NVL72 super nodes. This is closely related to the Ascend 910C chip at the heart of this computing system.
According to research analysis from multiple reliable sources and platforms such as Huawei Central, TrendForce News, and Reddit, Ascend 910C is formed by combining two Ascend 910B chips using co-packaging or chiplet technology. By combining the two 910B chips, the computing capability of 910C significantly improves, reaching 800 TFLOP/s (FP16) computing power and 3.2 TB/s memory bandwidth, nearly 80% of the performance of NVIDIA's H100.

As every advantage has its downside, this design approach, although it enhances performance in the short term, also brings significant drawbacks.
Firstly, from a technical perspective, this design can lead to issues such as increased power consumption and interconnection bottlenecks.
For instance, higher power consumption means greater cooling demands, increasing the cost and complexity of cooling systems (such as stronger fans, heat sinks, or liquid cooling systems). At the same time, in scenarios with high energy efficiency requirements like data centers, high power consumption will significantly increase operational costs.
According to renowned semiconductor and artificial intelligence research firm SemiAnalysis, the CM384 system consumes far more power than NVIDIA's GB200 NVL72 system. For example, it requires 3.9 times the power of GB200 NVL72, with a power difference of 2.3 times per FLOP, 1.8 times per TB/s of memory bandwidth, and 1.1 times per TB of HBM memory capacity ("power difference X times" here indicates that the power required per unit of performance/capacity is X times that of the baseline GB200 NVL72, i.e., energy efficiency is X times worse). Part of the reason for the above may stem from the combination design of the Ascend 910C chip itself.

Don't underestimate the increase in power consumption. In terms of actual deployment, the base investment for each AI GPU server is approximately $400,000, with power and cooling infrastructure accounting for more than one-third of construction costs. According to IDC surveys, 80% of data center decision-makers consider energy consumption and cooling to be key constraints. Specifically, with Huawei's CM384 system consuming 3.9 times the power of GB200 NVL72, its long-term operating costs are bound to rise significantly, and finding a balance between scale expansion and energy efficiency presents a huge challenge.
As for interconnection bottlenecks, despite 910C aiming to address severe interconnection issues in 910B, the design of combining two chips still may have limitations in interconnection bandwidth. Research from Huawei Central shows that the die-to-die bandwidth of 910C is only 1/10 to 1/20 of NVIDIA's H100. These bottlenecks may affect the efficiency of large-scale AI training tasks, specifically manifesting as performance not scaling linearly with the number of dies, with two dies typically unable to reach twice the performance of a single equivalent die, especially in scenarios requiring high bandwidth, such as training large language models (LLMs). Additionally, data transmission between different dies introduces extra latency and power consumption.
Beyond the aforementioned technical aspects, in terms of the ecosystem and market, it is well known that Huawei MindSpore's AI framework, which belongs to the same Ascend Computing as the Ascend chips, is still developing but cannot yet compare to NVIDIA's CUDA platform.
For instance, Unite.AI's analysis points out that the maturity and widespread adoption of MindSpore are relatively low, potentially limiting developer adoption, especially for long-term AI training tasks, which may result in 910C lagging behind NVIDIA in software support and developer ecosystems, thereby reducing efficiency in practical applications.
Finally, and more crucially, according to teardown, analysis, and reports from SemiAnalysis, TechInsights, WCCFTech, and others, although part of Ascend 910C is manufactured by Semiconductor Manufacturing International Corporation (SMIC), due to yield rates (it is claimed that Huawei's Ascend chip yield rate is only 32%, though some reports say the yield rate of Ascend 910C has improved to nearly 40%, still below the industry standard of 60%), most of it is still produced using TSMC's 7nm process.
The reason lies in the fact that domestic foundries, such as SMIC, although they have mastered 7nm technology, still lag behind TSMC in terms of yield, stability, mass production capability, and the supporting equipment and material ecosystem. This is particularly challenging for AI chips like Ascend 910C, which are large and technically complex. SMIC still faces challenges in achieving large-scale, high-yield production.
Therefore, even if there are domestic manufacturing options available, Huawei still prefers to rely on TSMC, which is more mature and stable in technology and production capacity, highlighting the predicament of relying on third-party channels to obtain wafers under the "chokepoint"困境of advanced manufacturing processes in China.
In addition, critical components of Ascend 910C, such as HBM, mainly come from Samsung suppliers (according to SemiAnalysis, primarily through CoAsia Electronics, Samsung's exclusive distributor in Greater China, shipping HBM to ASIC design service company Faraday, which then entrusts SPIL to encapsulate it using low-melting-point solder for easy subsequent extraction, and finally transports it to China for HBM recovery via de-soldering). As we all know, this supply chain model based on avoidance carries legal uncertainty and is highly unstable, posing extremely high risks, which is the biggest concern.
Diversified Domestic Manufacturers, Reducing Risks, Ensuring Stability, Promoting Autonomy
As mentioned earlier, it is not difficult to see that although Huawei Ascend 910C is in the leading position in domestic application and replacement, there are significant concerns in various aspects such as the chip's performance, the ecosystem, and the critical supply chain model, whether due to objective reasons or self-imposed ones. Therefore, other relevant domestic manufacturers must also participate in the replacement test.
In fact, in the field of AI chips, apart from Huawei, major tech companies such as Alibaba, Baidu, and Tencent have already laid out their own AI chips; among pure chip manufacturers, there are listed companies like Cambricon, Jingjiu Microelectronics, and Hygon Information Technology, as well as emerging enterprises with both technological depth and innovative vitality such as Xintong Technology, Hanbo Semiconductor, Mu Xi Integrated Circuit, Tian Shu Zhi Xin, and Horizon Robotics.
Among these, the tech giants such as Alibaba (including Pingtouge's Han Guang chip), Baidu (Kunlun chip), Tencent, and SenseTime, based on their vast business needs, develop AI chips for internal scenarios. These chips primarily serve their own cloud platforms or businesses, though they are not directly sold to a broad external market. They represent the top-tier application-driven chip design capabilities domestically and are an important part of the national AI computing power system.
As for Hygon Information Technology, which is a listed company, its Hygon DCU product series, based on the GPGPU architecture, has established a self-developed software stack fully compatible with the CUDA ecosystem and mainstream commercial computing software and artificial intelligence software. It can be widely applied in areas such as big data processing, artificial intelligence, and commercial computing. It has already been used in domestic supercomputers and AI training scenarios and can承接some of the demand after H20 restrictions. Companies such as Baidu, Alibaba, and Tencent have certified Hygon's DCU products and launched joint solutions, creating a complete stack of domestically-made hardware and software AI infrastructure. Additionally, leading domestic AI companies such as iFlytek, SenseTime, and Yitu have already ported and run many models on the Hygon DCU platform.
For example, Cambricon, as a leading domestic AI chip company, its Siyuan series chips can partially replace NVIDIA products in cloud and edge computing domains, especially through its fifth-generation intelligent processor microarchitecture, its products meet the needs of scenarios such as cloud training.
Aside from the aforementioned veteran companies, since 2019, a batch of domestic GPU startups have also been established and emerged as unicorns in the AI chip design sector, such as Bitmain Technology, Moore Threads, and SuYuan Technology.
For instance, Moore Threads, unlike Huawei Ascend, aims to build a broader general-purpose GPU ecosystem. To achieve this, Moore Threads built a unified software platform called MUSA (Moore Threads Unified System Architecture). Recently, Moore Threads officially released MUSA SDK 4.0.1, with the biggest breakthrough being the "full-chain integration" from chip design to software stack, achieving full migration of NVIDIA's CUDA, without changing user habits while improving speed by over 15%.
As for Bitmain Technology, another unicorn in AI chip design, it launched the BR100 GPGPU chip using a 7nm process as early as 2022. The chip's peak computing power exceeds three times that of the flagship products of international vendors at the time, setting a domestic record for interconnect bandwidth.
From the above, we can see that besides Huawei Ascend, there are many capable domestic players in the AI chip field, some of whom have the potential to replace NVIDIA GPUs. Given the aforementioned concerns with Huawei Ascend, only with active participation from these enterprises can we form a diversified landscape, reducing risks, ensuring stability, and promoting autonomy during the replacement process.
Conclusion: The recent restrictions of NVIDIA H20 and others in the Chinese market highlight the importance of domestic alternative solutions. However, as discussed above, the path toward replacing China's AI chips, and ultimately achieving autonomy, cannot rely solely on individual enterprises. Nor can it depend long-term on supply chain models filled with uncertainties. Instead, it lies in diversification, supporting the collaborative development of diverse domestic AI chip enterprises such as Huawei, Hygon Information Technology, and Moore Threads. This will help build a truly robust, comprehensive, and resilient全产业链 autonomous ecosystem, which is the correct solution to accelerate the self-reliance of China's AI chips.
Original article: https://www.toutiao.com/article/7495192813454328320/
Disclaimer: The article represents the author's personal views. Please express your opinions by clicking the "Top/Downvote" buttons below.