At the Huawei Developer Conference 2025 (HDC 2025) held on June 20, Zhang Ping'an, a standing director of Huawei and CEO of Huawei Cloud Computing, announced that the new generation of Ascend AI cloud services based on the CloudMatrix384 super node has been fully launched, providing powerful computing power for large model applications.

It was introduced that with the explosive growth in computational requirements for training and reasoning of large models, traditional computational architectures are struggling to support generational leaps in AI technology. The new generation of Ascend AI cloud services from Huawei Cloud is based on the CloudMatrix384 super node. It pioneers the integration of 384 Ascend NPU chips and 192 Kunpeng CPU chips through a brand-new high-speed network called MatrixLink, forming a super "AI server." This results in a single-card inference throughput leap to 2300 Tokens/s.

The super node architecture can better support the reasoning of hybrid expert MoE large models, achieving "one card per expert," where one super node can support parallel reasoning for 384 experts simultaneously, greatly improving efficiency. At the same time, the super node can also support "one card per computational task," flexibly allocating resources, increasing task parallel processing, reducing waiting times, and boosting effective computational usage rate (MFU) by more than 50%.

For training tasks involving trillion or tens of trillions of parameters, in cloud data centers, up to 432 super nodes can be cascaded into a super-large cluster with up to 160,000 cards; at the same time, super nodes can also support integrated deployment of training and inference computing power, such as "day inference and night training," with flexible allocation of training and inference computing power to help customers achieve optimal resource utilization.

Zhang Ping'an introduced that Sina News has deepened its cooperation with Huawei Cloud, building a unified inference platform for the "Intelligent Xiaolang" intelligent service system based on the CloudMatrix384 Ascend AI cloud service. The underlying Ascend AI computing power provides support. The delivery efficiency of inference has increased by more than 50%, and the speed of model launches has doubled; through software and hardware协同optimization, NPU utilization has increased by more than 40%.

Silicon Base Flow is using the CloudMatrix384 super node to efficiently provide inference services for DeepSeek V3 and R1 to millions of users. Mianbi Intelligence uses the CloudMatrix384 super node, which has improved the performance of its Xiaogang Cannon model's inference business by 2.7 times.

In the scientific research field, the Institute of Computing Technology of the Chinese Academy of Sciences has built its own model training framework based on the CloudMatrix384 super node, quickly constructing the Chinese Academy of Sciences' AI for Science research large model, breaking away from dependence on foreign high-performance AI computing platforms.

In the Internet sector, 360's Nano AI search provides super AI search services to users and has great demand for AI computing power, and has already started testing the CloudMatrix384 super node.

Currently, the Ascend AI cloud service has provided powerful AI computing power to more than 1,300 customers.

At the conference, Zhang Ping'an announced the release of PanGu Model 5.5, with comprehensive upgrades to five foundational models in natural language understanding (NLP), computer vision (CV), prediction, multimodal, and scientific computing. He emphasized that the PanGu Model is trained with full-stack hardware and software based on Ascend Cloud, marking that world-class large models can be developed based on the Ascend architecture.

In terms of NLP large models, the new 718B Deep Thinking model is an MoE large model consisting of 256 experts, significantly enhancing capabilities in knowledge reasoning, tool invocation, mathematics, and other areas, achieving leading capabilities. The PanGu Model achieved an efficient training and inference system that is Ascend-friendly through algorithms such as universal computation masking, global dynamic balancing, and grouped mixed expert MoGE, leading the industry in training MFU and single-card inference throughput. Meanwhile, it proposed technologies such as model-friendly vocabulary, sandwich architecture, and EP-Group load balancing loss, achieving competitiveness no weaker than that of top-tier models in the industry.

This article is an exclusive piece from Guancha Zazhi and cannot be reprinted without permission.

Original Source: https://www.toutiao.com/article/7517946281827680803/

Disclaimer: The views expressed in this article are solely those of the author. Please use the buttons below to express your attitude towards this article.