Huawei unveils the real "computing nuclear bomb" for the first time, earning the title of treasure of the exhibition.

(By Observer Net, Lv Dong)

On July 26, Observer Net learned from the World Artificial Intelligence Conference (WAIC 2025) site that Huawei has exhibited the Ascend 384 super node for the first time, and it was awarded "WAIC's Treasured Treasure." At the exhibition, Huawei also comprehensively demonstrated the innovative capabilities of the Ascend computing foundation, training and inference solutions, open-source and open soft and hardware ecology, as well as rich practices in thousands of industries such as the Internet, operators, finance, energy, and educational research.

Photo source: Observer Net

At present, the explosive growth of demand for computing power for large model training and inference poses challenges to traditional computing architectures, which suffer from low resource utilization and frequent failures, making it difficult to support the next generation of AI technology, thus requiring higher system engineering capabilities for computing clusters.

In May this year, at the Kunpeng Ascend Developer Conference, Huawei launched the Ascend 384 super node, consisting of 12 computing cabinets and 4 bus cabinets. This massive "computing nuclear bomb" successfully achieved the largest scale of 384 NPU cards with high-speed bus interconnection in the industry. According to the on-site introduction, the Ascend super node has three major advantages: ultra-large bandwidth, ultra-low latency, and super performance, including multiple training and inference products.

Compared with industry super node solutions like NVIDIA NVL72, the key innovation of the Ascend super node lies in completely breaking the traditional CPU-centered von Neumann architecture, the so-called "master-slave architecture," and innovatively proposing a "fully peer-to-peer architecture." With the key breakthrough of high-speed interconnection bus, the bus is extended from inside the server to the entire cabinet, even across cabinets, finally connecting and pooling all resources such as CPU, NPU, DPU, storage, and memory. This can remove many intermediate links, thereby achieving true point-to-point interconnection, and thus achieving greater computing density and interconnection bandwidth.

Video loading...

Looking domestically, currently, only Huawei can build super nodes using fully domestic chips and surpass NVIDIA NVL72. The Ascend 384 super node forms a super "AI server" through the new high-speed network MatrixLink fully peer-to-peer interconnection. Its total computing capacity reaches 300 Pflops, 1.7 times that of NVIDIA NVL72; the total network interconnection bandwidth reaches 269 TB/s, an increase of 107% compared to NVIDIA NVL72; the total memory bandwidth reaches 1229 TB/s, an increase of 113% compared to NVIDIA NVL72; and the single-card inference throughput jumps to 2300 Tokens/s.

More importantly, through optimal load balancing networking schemes, the Ascend super node can further expand into a super node cluster called Atlas 900 SuperCluster containing tens of thousands of cards, supporting even larger-scale model evolution in the future.

Performance test data obtained by Observer Net shows that on the Ascend super node cluster, the performance of LLaMA3 and other trillion-parameter dense models is more than 2.5 times higher than that of traditional clusters; on Qwen, DeepSeek, and other multimodal, MoE models with higher communication requirements, the performance improvement can reach up to 3 times, 1.2 times higher than other industry clusters, leading the industry.

"From 7nm to 5nm, 3nm, and 2nm chip process technologies, each generation's performance improvement does not exceed 20%, and overall performance improves by about 50%. Huawei has improved the computing power utilization of chips through efficient super node systems. Without changing the hardware, through system engineering optimization and efficient resource scheduling, it partially compensates for the shortcomings of chip process technology," said a Huawei expert to Observer Net.

Ascend 384 super node architecture

Since 2019, Ascend has not only focused on root technologies such as chips but also continuously expanded the industrial ecosystem, providing user-friendly software, tools, and platforms to achieve deep integration of AI technology and industry scenarios. Currently, over 80 large models have been adapted and developed based on Ascend in the industry. In terms of basic large models, there are accumulations in multiple technical directions, such as Sunflower Cognitive, DeepSeek, Qwen, Pengcheng, and LLaMA. At the same time, Ascend has jointly incubated over 6,000 industry solutions with more than 2,700 industry partners, enabling hundreds of models and thousands of scenarios, accelerating industry intelligence.

Observer Net learned that at this WAIC exhibition, Huawei's booth area exceeds 800 square meters. In addition to showcasing Ascend's soft and hardware capabilities, training and inference solutions, and open-source and open soft and hardware ecology, Huawei also collaborates with partners to showcase 11 industry solution practices in sectors such as the Internet, operators, finance, government, healthcare, oil and gas, and transportation, allowing attendees to visit and exchange on-site.

This article is exclusive to Observer Net. Unauthorized reproduction is prohibited.

Original: https://www.toutiao.com/article/7531271384647926306/

Statement: The article represents the personal views of the author. Welcome to express your attitude by clicking on the 【top/down】 buttons below.