Institute of Automation, Chinese Academy of Sciences: Brain-like Spiking Large Model "Shunxi 1.0" Based on Endogenous Complexity Makes Its Debut

Source: Global Times

【Global Times Technology Comprehensive Report】On September 8, it was reported that the Institute of Automation, Chinese Academy of Sciences announced that recently, the team of Li Guoqi and Xu Bo from the Institute of Automation, Chinese Academy of Sciences, based on their original endogenous complexity theory series of papers, cooperated with Muxi MetaX to develop a brain-like spiking large model "Shunxi 1.0" (SpikingBrain-1.0), which completed full-process training and inference on the domestic thousand-card GPU computing platform. It achieved an order-of-magnitude efficiency and speed improvement in ultra-long sequence reasoning, demonstrating the feasibility of building a domestically independent and controllable new non-Transformer large model architecture ecosystem. The research team has open-sourced the SpikingBrain-1.0-7B model and opened the test website for SpikingBrain-1.0-76B, and simultaneously published the Chinese and English technical reports of the brain-like spiking large model SpikingBrain-1.0 that have been validated by the industry.

The Institute of Automation, Chinese Academy of Sciences stated, "Currently, large models based on the Transformer architecture enhance model intelligence by increasing network scale, computing resources, and data volume under the drive of the Scaling law. However, the basic computational unit of the model is a simple point neuron model, which we refer to as the 'external complexity-based' general intelligent implementation method. The inherent disadvantages of the Transformer architecture are that the training cost increases quadratically with the sequence length, and the memory usage during inference increases linearly with the sequence length, forming the main bottleneck of resource consumption, which limits its ability to process ultra-long sequences."

The development team proposed a "endogenous complexity-based" large model architecture by drawing inspiration from the complex internal mechanisms of brain neurons, developing the brain-like spiking large model "Shunxi 1.0" (SpikingBrain-1.0). In theory, they established a connection between the endogenous dynamics of spiking neurons and linear attention models, revealing that the existing linear attention mechanism is a special simplified form of dendritic computation, thereby clearly demonstrating a new feasible path for continuously improving model complexity and performance. The development team then constructed and open-sourced a new brain-like foundational model based on spiking neurons, with linear (SpikingBrain-1.0-7B) and mixed linear complexity (SpikingBrain-1.0-76B, activation parameter count 12B), developed an efficient training and inference framework for domestic GPUs (Muxi MetaX Xi Yun C550) clusters, the Triton operator library, model parallel strategies, and cluster communication primitives.

According to the Institute of Automation, Chinese Academy of Sciences, SpikingBrain-1.0 has achieved breakthroughs in multiple core performance aspects, including efficient training with extremely low data volume, an order-of-magnitude improvement in inference efficiency, the construction of a domestically independent and controllable brain-like large model ecosystem, and the multi-scale sparse mechanism based on dynamic threshold spiking.

The Institute of Automation, Chinese Academy of Sciences stated that this is the first time China has proposed a large-scale brain-like linear foundational model architecture and built a training and inference framework for a brain-like spiking large model on a domestic GPU computing cluster. The proposed model solves the performance degradation problem of large-scale brain-like models under pulse-driven constraints. Its ultra-long sequence processing capability has significant potential efficiency advantages in scenarios such as legal/medical document analysis, complex multi-agent simulation, high-energy particle physics experiments, DNA sequence analysis, and molecular dynamics trajectory modeling. This released large model provides a new technological route for the development of the next generation of artificial intelligence, and will inspire the next generation of low-power neuromorphic computing theories and chip design. (Sihan)

Original article: https://www.toutiao.com/article/7547650951507657251/

Statement: This article represents the views of the author. Please express your opinion by clicking on the [Upvote/Downvote] button below.