(By Guancha Observer Network/Lv Dong)
"The chip issue is actually not something to worry about. By using methods such as superposition and clustering, the computational results can be comparable to the most advanced levels." Recently, Ren Zhengfei, President of Huawei, made a public statement that has strengthened confidence in China's AI development across society.
We all know that China has formed an industrial landscape with "hundreds of models and thousands of forms," with several large models leading globally. Regardless of how the AI industry develops, computing power remains the fundamental driving force for model evolution. With external technological blockades intensifying, can domestic computing power shoulder great responsibilities? Can it only perform inference but not complex training? Many people are actually uncertain about this.
"Our single-chip technology still lags behind by one generation compared to the U.S. We use mathematics to compensate for physics, non-Moore's Law to complement Moore's Law, and group computation to make up for single chips, achieving practical results." Ren Zhengfei's public statement has given the domestic AI industry a "reassuring pill."
Carrying concern for the development of domestic computing power, the Guancha Observer Network had an exchange with technical experts from Huawei's 2012 Lab. We learned that the Ascend computing power not only leads globally in inference performance but also maintains high system stability, efficiently training world-class trillion-parameter large models. Particularly, the CloudMatrix 384 super-node technology, which complements single points with systems rather than stacking chips, rivals NVIDIA in core metrics, becoming a solid foundation for China's AI development.

Why isn't the chip issue something to worry about?
Under external sanctions, the chip issue has been like a "Sword of Damocles" hanging over China's industrial sector. Especially as Sino-American AI competition becomes more intense, the U.S. is not only doing its utmost to curb China's advanced chip manufacturing capabilities but also continuously tightening export controls on NVIDIA's high-end chips, attempting to halt China's AI industry's progress within a "computing power cage."
However, there is opportunity in crisis. The trend in the industry is that when large neural network models like Transformer emerge, and trillion-parameter models begin to appear, the demand for computing power and memory capacity grows exponentially. Even a single GPU or server finds it difficult to cope, making cluster computing power the prevailing trend. This provides an opportunity for China to solve its computing power dilemma using a systematic approach.
Talking about "systems theory," older generations in China may not be unfamiliar with it. Qian Xuesen's "On Systems Engineering" emphasizes the core idea of viewing the research subject as a whole rather than isolated points, optimizing the whole to compensate for single-point deficiencies.
Huawei's breakthrough during the sanctions was inspired by systems engineering. Technical experts told the Guancha Observer Network that computing infrastructure is a complex system; Huawei has connected all parts, organized a computing power battle session internally, gathered research forces from Huawei Cloud, models, base, chips, hardware engineering, and basic software development, engaging in cross-departmental operations and deep collaboration. Such an environment and mechanism can produce synergistic effects and integrated systems engineering, effectively utilizing and collaboratively innovating "miscellaneous" technologies.
Based on systems engineering, Huawei constructed a giant computing power infrastructure called CloudMatrix 384 super-node, enabling 384 Ascend cards to work together like one computer through a fully peer-to-peer high-speed interconnect architecture. Its total computing power reaches 300 Pflops, 1.7 times that of NVIDIA's flagship NVL72; its total network interconnection bandwidth is 269 TB/s, 107% higher than NVIDIA's NVL72; and its total memory bandwidth is 1229 TB/s, 113% higher than NVIDIA's NVL72. More importantly, it can scale into Atlas 900 SuperCluster super-node clusters containing tens of thousands of cards, supporting even larger-scale model evolutions in the future.
"Super nodes are complex systems; single-card technical indicators do not represent system efficiency. Our 'systems engineering' aims for overall system optimization rather than maximizing individual components. Solving these ultra-complex system issues requires understanding theories such as systems theory, control theory, information theory, and computational mathematics; simultaneously, we model and simulate the computing system mathematically, trying to utilize every part without waste and ensure perfect coordination and efficient collaboration among all components." Huawei technical experts stated.
We all know that chip manufacturing follows "Moore's Law," but this assumes access to advanced equipment and materials. Under sanction blockades, Huawei achieved the effect of "non-Moore's Law compensating for Moore's Law" through complex system optimization, eliminating the limitations imposed by single-chip shortcomings.
Huawei technical experts pointed out that each generation of single-chip process improvements from 7 nanometers to 5 nanometers, 3 nanometers, and 2 nanometers result in performance increases of less than 20%, with an overall improvement of around 50%. Huawei enhanced chip computing power utilization through an efficient super-node system. "For MoE large model training, our previous MFU (model computing power utilization) was 30%, similar to the industry standard. Our latest published data has increased to 41%, and in the laboratory, it exceeds 45%. From 30% to 45%, the utilization rate has effectively increased by 50%. Without changing any hardware, through systems engineering optimization and efficient resource scheduling, we have somewhat compensated for the insufficiency of chip processes."
"Westerners keep patching, while we redefine the architecture."
In recent years, as Huawei has been under sanctions, the computing power industry has also been undergoing transformation.
With the continuous evolution of the Scaling Law of large models, the entire AI industry has generated massive computing power demands. However, traditional computing clusters have hit bottlenecks; unlimited stacking of cards does not lead to linear increases in computing power but instead creates problems such as "memory walls," "scale walls," and "communication walls." This is because within clusters, computing cards and servers do not work independently but need to calculate while "communicating." If communication capabilities cannot keep up, computing power will be idle.
Over the past eight years, single-card hardware computing power has increased 40-fold, but intra-node bus bandwidth has only grown 9-fold, and inter-node network bandwidth has only increased 4-fold. This makes cluster network communication the biggest challenge for current large model training and inference. Therefore, if communication efficiency cannot be improved, simply stacking 384 Ascend cards does not necessarily result in better computational outcomes than 72 NVIDIA cards, as the communication overhead between cards and servers would offset the benefits of increased computing power, resulting in effective computing power decreasing rather than increasing.
As a pioneer in the computing power industry, NVIDIA was aware of this early on. Huang Renxun's approach was to transplant NVLink technology from consumer-grade graphics cards to computing clusters, essentially building a "dedicated super-wide lane" between GPUs and integrating multiple GPUs, CPUs, high-bandwidth memory, NVLink/NVSwitch, etc., creating the NVIDIA NVL72 super-node.
However, the problem is that NVLink can only be used for communication between NVIDIA's own GPUs; non-GPU heterogeneous hardware within nodes, such as NPUs and FPGAs, cannot use this "super-wide lane" and must still rely on the less efficient PCIe protocol for CPU relays. Additionally, inter-node communication occurs via Ethernet/InfiniBand protocols, which can also become bandwidth bottlenecks in massive computations.
In contrast, Huawei's CloudMatrix 384 super-node restructures the computing architecture, completely breaking away from the traditional von Neumann architecture centered on CPUs. It innovatively proposes a "fully peer-to-peer architecture." It constructs a high-speed interconnect bus with 3168 fibers and 6912 400G optical modules, extending the bus from within servers to entire racks and even across racks, finally interconnecting and pooling resources such as CPUs, NPUs, DPUs, storage, and memory. This removes numerous intermediate relay stages, achieving true point-to-point interconnects and thereby increasing computing density and interconnect bandwidth.
"The West inherits and develops incrementally; Chairman Ren vividly compares it to a patchwork quilt, where after the cloth tears, patches are constantly added. Different protocols require conversion for interoperability, and much of the payload is lost. We have redefined an equal architecture, unifying all communication protocols so that no conversion is needed for interoperability, increasing the effective payload." Huawei technical experts told the Guancha Observer Network.
"Iron needs to be strong itself, fully meeting domestic needs"
How effective is Huawei's powerful Ascend computing power in actual applications? Especially under the blockade of NVIDIA's high-end computing power, can Ascend computing power provide the backbone and confidence for China's AI development? This is what both the industry and outside are truly concerned about.
Those who pay attention to industry development have surely noticed that since mid-May, Huawei has released a series of technical reports. It is not difficult to see from these reports that Ascend computing power not only supports industrial-level inference capabilities, enabling Day0 migration and one-click deployment, but also efficiently trains models of different sizes such as 72B and 718B. It also disclosed technical reports on the PanGu Ultra MoE model architecture and training methods, revealing many technical details, fully showcasing Ascend's leap in ultra-large MoE training performance.

Compared to inference, large model training places higher demands on computing infrastructure. Despite being one generation behind in single chips, systems engineering remains Huawei's key to breakthroughs. For example, during the training of ultra-large MoE models, facing issues of system congestion and resource mismatches, Huawei creatively optimized the scheduling of computing, memory, and communication to the extreme, combined with affinity design and mathematical algorithm innovations specific to Ascend, achieving super parallelism and obtaining improvements in "dynamic metrics." For instance, on a ten-thousand-card Ascend cluster, the training achieved a 41% computing power utilization rate, 98% cluster availability, and 95% linearity—these are real computing powers users can feel during usage.
To be honest, large-scale computing clusters ultimately compete on comprehensive capabilities rather than single-chip capabilities. For example, in terms of cooling, a large number of stacked chips generate immense heat, and if it cannot be dissipated, the system will fail. In the field of optical communication, although fiber optics offer advantages in high bandwidth and high speed, they consume high power and are relatively fragile, with even minor issues potentially causing system disconnection.
"Striving for overall optimality, system engineering is the goal pursued by every vendor, depending on whether it can be achieved. A super node architecture must be fully interconnected, unconverged, with large bandwidth and low latency, and supported by software systems managing resources to achieve super parallelism and efficient scheduling. To keep the system running smoothly requires significant dynamic power supply, efficient cooling, and other hardware engineering capabilities." Huawei technical experts stated.
Huawei has accumulated profound experience over decades in the electronic information field, particularly in hardware engineering and foundational software. Not only does it lead the industry in optical communications but also excels in cooling engineering, having the capability to build complex systems. About 10 years ago, Huawei established research institutes overseas, specifically focusing on thermal theory and thermal engineering. Among its 86 laboratories, there is a thermodynamics lab. Whether it’s liquid cooling or air cooling, Huawei has reached the industry's leading level, providing a solid guarantee for reliable large-scale training.
In cloud computing centers, Huawei Cloud has equipped super nodes with full-spectrum professional doctors "Ascend Cloud Brain" and built a constant-temperature "training base," using liquid cold plate cooling technology to allow coolant to directly contact heat-generating components, improving cooling efficiency by 50% compared to traditional air cooling. Coupled with the iCooling intelligent temperature control system, which dynamically adjusts strategies every five minutes, regardless of external temperature changes, the data center can maintain optimal conditions. Ultimately, the data center's energy efficiency ratio PUE was reduced to 1.12, saving 70% more energy than the industry average.
"Competitive strength must increase, and ultimately, iron must be strong. We will fully meet the needs of domestic clients and not let them down." Huawei technical experts told the Guancha Observer Network, "Our philosophy is to improve competitiveness through tangible technological advancements, ensuring customers can use our products effectively. This is our direction of effort. The toughest times have passed, and this disclosure has made everyone feel our openness and progress, enhancing customer confidence in us."
"Openness will make us progress further."
In the industry, Huawei is one of the few companies that both builds computing power infrastructure and develops foundational large models. This advantage allows the model and computing power base teams to deeply collaborate. Through model training, challenges and computing power issues are discovered, prompting improvements in computing power, which in turn supports model training and inference. This driving mechanism of traction and support easily uncovers deeper issues.
Huawei is also open. Technical experts candidly told the Guancha Observer Network that Ascend computing power supports "hundreds of models and thousands of forms," including major domestic models such as Qwen and DeepSeek, which are openly supported. Huawei discloses all innovative technologies and solutions developed during PanGu training, including relevant code and technical documentation, and sends experts to the field to support customers in using Ascend effectively.
"Our C-end application of large models mainly enhances the competitiveness of Huawei phones; our B-end industries, such as mining, steel, electricity, transportation, energy, healthcare, finance, ports, etc., are our main battlegrounds, with no conflicting interests with clients, so clients are not worried." Huawei technical experts frankly stated that in terms of industry intelligence applications, Huawei focuses on national "key industries" related to national welfare and livelihood, and will fully meet the needs of various industries to support China's computing power demands in the AI era.
Beyond maintaining openness to clients, Huawei also deeply opens its underlying capabilities to developers and universities. For example, the Ascend Heterogeneous Computing Architecture CANN adheres to a deep open strategy, initially supporting industry-standard open frameworks such as PyTorch and TensorFlow. Currently, over 6000 developers actively contribute, continuously innovating at various levels such as the operating system, operator algorithms, graph optimization, and acceleration libraries, collaborating with over 30 partners, including internet, telecommunications, and finance sectors, to develop more than 260 high-performance operators.
"Huawei invests heavily in basic research annually, and our basic research is very open. Besides our own research, we also vigorously fund universities and collaborate on joint research and technology cooperation. 'Absorb cosmic energy through a cup of coffee, and bond world wisdom with a bucket of glue,' continuously accumulating organizational capabilities to achieve 'deep roots.' Our scientists and experts also immerse themselves in business campaigns, applying theories and technologies to business to creatively solve practical business problems, enhancing product competitiveness and creating commercial value to achieve 'lush foliage.' At Huawei, we organically combine research and innovation to realize 'deep roots and lush foliage.'" Huawei technical experts stated.
"Openness will drive us to progress further." Ren Zhengfei's view is not only applicable to countries but also to enterprises. It is precisely due to adherence to openness and innovation that Huawei continues to achieve technological breakthroughs. When single-point technologies are restricted, Huawei regains advantages through systems engineering. In an increasingly complex international environment, the outstanding performance of the Ascend computing power platform in training and inference not only provides the industry with an alternative to NVIDIA but also gives China's AI industry a "reassuring pill."
Original source: https://www.toutiao.com/article/7518189993643246143/
Disclaimer: This article solely represents the author's views. Feel free to express your stance by clicking the 'upvote/downvote' buttons below.