[The Shaken Dominance of NVIDIA] Companies Develop Cost-Effective Chips to Challenge GPU

"Like Windows in the PC era, it will lead the AI era as an OS (operating system)."

This is a sentence that defines NVIDIA's GPU (Graphics Processing Unit). GPUs, which can process large amounts of data simultaneously, have become essential products in the AI era, and NVIDIA has thus risen to the top of the AI market. In the AI chip market based on GPUs, NVIDIA holds a 90% market share. Each GPU costs between $30,000 to $40,000 (approximately 40 million to 50 million Korean won), which is very expensive and hard to obtain even with money. Therefore, NVIDIA has become the most profitable company in the world.

However, recently, major tech companies have started developing custom chips (ASICs) or diversifying their semiconductor suppliers, indicating signs of instability in NVIDIA's AI empire. ASICs are specialized chips for specific purposes, offering advantages in power efficiency and cost compared to NVIDIA's GPUs. The shift in AI development from the "learning" phase, which requires significant computing power, to the relatively less costly "inference" phase has also created cracks in NVIDIA's monopoly. This is because, unlike learning, inference processes benefit more from customized chips with high power efficiency.

Customized AI Chips That Can Replace NVIDIA

When Google released the AI model "Gemini 3," attention was also drawn to Google's custom chip, TPU (Tensor Processing Unit). TPU is a high-performance semiconductor developed by Google about 10 years ago to support its own AI development. Google handles the basic design of the TPU, while U.S. chip design company Broadcom and MediaTek in Taiwan are responsible for the physical design of the chip. It incorporates HBM (High Bandwidth Memory) from SK Hynix, Samsung Electronics, and Micron. These components are finally assembled by TSMC in Taiwan into the final product. Because they are specifically designed for AI, they offer higher performance and lower power consumption than GPUs in certain tasks, thereby reducing operational costs. AI startup Anthropic plans to use up to 1 million TPUs to develop AI models, and it is said that Meta will also introduce Google's TPU in its own data centers.

OpenAI is collaborating with Broadcom to plan the production of its own chip by the end of next year. This is due to the "Star Gate" project, which involves investing 50 billion USD to build data centers and requires a large number of chips. Meta is developing its own AI chip, "MTIA," to be used in AI development and AI services. Amazon Web Services (AWS) is operating its own AI data center equipped with 500,000 "Trainium 2" chips, with major clients including Anthropic and Databricks. Chinese companies such as Alibaba and Baidu are also using self-developed chips to train AI models, aiming to reduce their reliance on NVIDIA.

The AI Ecosystem May Be Changing

The trend of moving away from NVIDIA is largely driven by economic reasons. Custom chips are cheaper and more energy-efficient, which benefits operations. Morgan Stanley stated that installing 24,000 of NVIDIA's latest Blackwell GPUs would cost 852 million USD (about 1.2 trillion Korean won), while the installation cost of the same scale of Google's TPU is only 99 million USD (about 145 billion Korean won). The emergence of cheap chips could alleviate concerns about the AI bubble caused by excessive investment in AI infrastructure in recent times.

The shift in AI from learning to inference is also having an impact. In the initial stage of creating an AI model, it is important to "learn" massive data. Therefore, a large number of high-performance NVIDIA GPUs are needed. However, in the "inference" stage, where AI services are provided based on the already created AI, high-performance products like GPUs are no longer required. This is why power-efficient and compact semiconductors like TPU or NPU (Neural Processing Unit) are gaining popularity. Industry professionals say, "Although many companies currently use both NVIDIA GPUs and other companies' self-developed chips, the proportion of NVIDIA GPUs seems to be gradually decreasing." However, the current mainstream view is that the performance of NVIDIA GPUs is still superior to other self-developed chips.

The AI ecosystem centered around NVIDIA is also expected to change. Currently, the structure of NVIDIA-designed chips being produced by TSMC has been fixed. Design companies that collaborate with large tech companies and design firms are emerging as competitors.

CPU, GPU, TPU, NPU

The CPU (Central Processing Unit), the basic brain of a computer, is like a genius chef who can easily prepare various dishes such as Korean, Japanese, and Chinese cuisine. It completes all work independently, but this has a drawback of taking a long time.

In contrast, the GPU (Graphics Processing Unit) may not be as powerful, but it has the ability to quickly make specific dishes with 1,000 efficient part-time workers. In the AI era, large amounts of data need to be processed with simple repetitive calculations and learning, which is why GPUs have gained attention. Hiring 1,000 part-time workers is costly (in terms of electricity) and requires a large space.

TPU (Tensor Processing Unit) is a high-performance semiconductor developed by Google for AI development. Unlike CPUs and GPUs, it is more like a specialized machine that excels at making one specific dish (such as dumplings). Although it does not require as many part-time workers as GPUs, it still needs a large factory.

NPU (Neural Processing Unit) is a semiconductor that mimics the human brain. It is small, lightweight, consumes little power, and is highly efficient, making it ideal for smartphones and home appliances.

Source: Chosun Ilbo

Original: https://www.toutiao.com/article/7577628945047192127/

Disclaimer: The article represents the views of the author and readers are welcome to express their opinions via the [Up/Down] buttons below.