No Longer Relying on the United States! Singapore's National AI Plan "Changes Heart" with Alibaba Qwen

On November 24, Alibaba Cloud and Singapore's National Artificial Intelligence Program (AISG) jointly announced a major development: the latest national large language model of Singapore, "Sea-Lion" (Sea-Lion v4), will no longer follow the previous U.S. technology path, but will be fully built based on Alibaba's open-source Qwen3-32B Tongyi Qianwen model.

This is the latest achievement of Chinese open-source models in the global market, following Silicon Valley tycoon Chamath Palihapitiya's announcement to replace OpenAI with Kimi as a productivity tool, American programming platforms Vercel and Windsurf integrating Zhipu models, and Airbnb CEO stating that Alibaba's Qwen is more user-friendly than American models. The recognition of this project by Singapore's National AI Program means that Chinese open-source large models have already gained the ability to replace or even surpass Silicon Valley giants in the fields of "sovereign AI" and "multilingual adaptation."

In December 2023, Singapore launched a 70 million SGD (52 million USD) initiative aimed at establishing research and engineering capabilities for multimodal large language models (LLMs), including the development of Sea-Lion (Southeast Asian Language Integration Network).

However, this market with 600 million people and a digital economy heading towards trillions of dollars has long been a "blind spot" for Western AI.

This "blind spot" first manifested in the extreme lack of data. Before the emergence of Sea-Lion, mainstream models like Meta Llama 2 had only an astonishing 0.5% of Southeast Asian language content.

This English-centric training logic made Sea-Lion, which was trained on Meta's open-source model Llama2, almost a "Southeast Asian illiterate." In early tests, the model listed Venezuela, a South American country, as an ASEAN member state. This kind of "hallucination" due to lack of regional knowledge exposed the fatal shortcomings of Western general models in localized applications.

More troubling for local developers was the language and cultural barriers. "Code-switching" is prevalent in Southeast Asia, where English is mixed with dialects, such as Singlish in Singapore or Manglish in Malaysia. Faced with this complex mixed context, standard American AI models often found themselves helpless, unable to understand the subtle differences and cultural references within it.

Although Llama performed well among open-source models at the time, its "English-centric" nature remained difficult to change, resulting in very low efficiency when processing non-Latin script languages like Thai and Burmese.

AISG gradually realized that using Silicon Valley's open-source models for development was not the best solution for Southeast Asian countries. They needed to find a foundation that truly understood multilingualism and the Asian context.

The final choice for the v4 version was China, with AISG selecting Alibaba's Qwen3-32B as the base model for the new Sea-Lion.

Different from Western models, the Qwen3 base model was trained on 36 trillion tokens of data during the pre-training phase, covering 119 languages and dialects worldwide. This "native multilingual capability" allows Qwen not only to "understand" Indonesian and Malay, but also to comprehend the grammatical structures of these languages at the fundamental level, greatly reducing the threshold for AISG's subsequent training.

To address the unique writing habits of Southeast Asian languages, Qwen-Sea-Lion-v4 abandoned the sentence tokenizer commonly used by Western models and instead adopted a more advanced Byte Pair Encoding (BPE) tokenizer. Given that languages like Thai and Burmese typically lack clear spaces between words, BPE technology can more accurately segment non-Latin characters, not only improving translation accuracy but also significantly enhancing inference speed.

Beyond technical indicators, practical considerations for commercial implementation were also key factors in Alibaba's success. Southeast Asia has a large number of small and medium-sized enterprises that cannot afford expensive H100 GPU clusters. Qwen-Sea-Lion-v4 has been optimized to run smoothly on consumer-grade laptops equipped with 32GB of memory.

This means that a regular Indonesian developer can deploy this national-level model locally on a high-end computer. This characteristic of "industrial-level capability with a consumer-level threshold" precisely addresses the region's shortage of computing resources.

This collaboration is not merely a one-way technology transfer but a deep bidirectional integration. According to the agreement, Alibaba provided a strong general reasoning foundation, while AISG contributed its precious, cleaned 100 billion Southeast Asian language tokens. These data are completely free of copyright risks, and the concentration of Southeast Asian content reaches 13%, which is 26 times that of Llama2.

Through "advanced fine-tuning" technology, Alibaba injected these unique regional knowledge into Qwen, enabling it to accurately capture the cultural nuances of the region. The effect of this strong partnership was immediate — on the Sea-Helm evaluation list, Sea-Lionv4, equipped with Alibaba's "heart," quickly took the top position among similarly sized open-source models.

If the previous recognition of models like Qwen, Kimi, and Zhipu by Silicon Valley tycoons was based on dual advantages of performance and cost-effectiveness, then the evolution of Singapore's Sea-Lion project from AWS to Alibaba Cloud, from Llama to Qwen, reflects a subtle shift in global AI power dynamics.

For a long time, global technological infrastructure has been almost monopolized by the United States. However, in the era of large models, Chinese companies are becoming the preferred partners for "Global South" countries to build sovereign AI, thanks to their profound understanding of multilingual environments and optimal cost-performance optimization.

Original article: https://www.toutiao.com/article/7576611366639862307/

Statement: The article represents the views of the author. Please express your opinion by clicking the [Upvote/Downvote] button below.