Article by Cyber Auto

The conflict between OpenAI's CEO Sam Altman and Tesla's CEO Elon Musk has become a hot topic in Silicon Valley.

Both are co-founders of OpenAI, but after Altman shifted OpenAI towards commercial operations, Musk accused him of deviating from the original mission and sued him for breaching the founding agreement. Additionally, Musk founded xAI and directly competed with OpenAI.

Altman also fought back, revealing emails showing that Musk had attempted to take control of OpenAI, and after being rejected, he kept obstructing it.

Altman may have another "turning the tables" move in mind, which is to develop autonomous driving and compete with Tesla's FSD.

Recently, Altman appeared on his brother Jack Altman's comedy show, and it is unclear whether he accidentally revealed something during the conversation.

He said:

I think we have some new technology that could just do self-driving for standard cars way better than any current approach has worked.

When he said "much better than current technology," it naturally includes Musk's FSD.

However, Altman did not elaborate on this technology or its timeline. He simply said:

If our AI techniques can like really go drive a car that's still pretty cool.

According to DealBook, a subsidiary of The New York Times, this technology is still in the early stages and commercialization is far away.

According to its analysis, this technology involves OpenAI's Sora video software and its robotics team, but an OpenAI spokesperson did not comment.

Previously, OpenAI had not directly explored the autonomous driving business, but had invested in some companies in the fields of autonomous driving and automotive intelligence.

Naturally, as one of the most promising scenarios for AI applications, autonomous driving has a broad future and has attracted significant attention. If OpenAI truly has a secret weapon, it will not give up such a huge market.

Looking at this, Altman and Musk are likely to argue even more fiercely in the future.

What Secret Weapon?

With a simple word from Altman, people would believe that OpenAI truly has a "much better" autonomous driving technology?

After all, there are many strong players in current autonomous driving, including Waymo under Google, Tesla, Moblieye, Qualcomm, Bosch, and a large number of Chinese companies that have been exploring for a long time.

What could be OpenAI's technical approach?

Everyone might recall that in early 2024, OpenAI released Sora - a text-to-video generation model.

Sora-generated video

Sora can quickly create high-fidelity videos up to a minute long based on user input text, and can also generate videos from existing static images.

Sora's generated videos immediately amazed the world because this model can understand the physical properties and relationships between different elements in complex scenes, grasp how objects exist and move in the physical world, and thus generate videos that are indistinguishable from reality.

Almost immediately after the release of Sora, the autonomous driving and intelligent driving industries discussed using Sora for simulation and training in autonomous driving, generating synthetic video data, especially for extreme scenarios (corner cases), to compensate for the lack of real-world data or high costs.

However, professionals quickly pointed out that Sora's generated images do not fully conform to physical principles and may also fail to capture driving dynamics such as braking or turning. Therefore, they cannot be used as video data for intelligent driving model training.

Nevertheless, many researchers and practitioners later believed that simulations that comply with physical principles could still provide data for training. Or, they could be used for reinforcement training of models.

Recently, the autonomous driving industry has been enthusiastic about building "world models" as the base model for autonomous driving models. OpenAI initially defined Sora as a world model capable of generating videos.

Companies like NIO and XPeng develop world models, the logic being that the AI system creates a mental map of the world to understand how the world works, similar to human understanding, and then drives the vehicle based on this foundation.

Sora's concept has some alignment with the goal of world models to simulate the real world.

Additionally, the mainstream concept in current autonomous driving development is the "big data-big model-big computing power" approach. OpenAI does not have driving data, so to provide better autonomous driving technology, big computing power is no problem, big models can be achieved, but big data is like water without a source - if it can be generated through simulation, it makes logical sense. However, many experts believe that relying on simulation data carries significant risks.

OpenAI's Automotive Intelligence Business

OpenAI itself has never developed autonomous driving or smart cabin systems, but through its investments, it has touched upon related areas.

In 2023, OpenAI invested $5 million in Ghost Autonomy. This autonomous driving company also received Microsoft's computing power support and had previously tried to apply AI language models to autonomous driving. However, it went bankrupt in 2024.

Ghost Autonomy's autonomous driving vehicle

On June 10, 2025, OpenAI partnered with the automotive intelligence company Applied Intuition. The collaboration focused on integrating the latest AI technological developments into modern vehicles, transforming them into smart companions.

The official announcement stated that by introducing large language model-driven voice assistants and agents into vehicles, the next generation of cars will become productivity tools with deep personalization experiences.

The statement also mentioned that one of the core goals of the collaboration is to achieve seamless integration between mobile devices and private vehicle intelligent systems. In addition, Applied Intuition will deploy ChatGPT across multiple departments to help employees improve work efficiency, optimize strategy planning, and more effectively achieve company objectives.

From these descriptions, the collaboration seems to be more focused on the human-machine interaction aspects of the smart cabin rather than direct application in autonomous driving.

From Language Models to Multimodal Models and World Models

Industry views once suggested that relying on the rapid progress of large language models, spatial intelligence such as autonomous driving could soon be realized. However, today's AI experts believe that relying solely on language models is insufficient.

Although OpenAI has shocked the world with large language models, it still centers around large language models (LLMs), but has gradually expanded into multimodal models and world models.

Altman has also stated that world models need to possess the ability to "understand physical causal relationships and predict event developments," which, when combined with the reasoning capabilities of LLMs, could potentially drive breakthroughs in AGI (Artificial General Intelligence).

It is not only OpenAI that holds this view. Including Fei-Fei Li, the mother of artificial intelligence, and Yann LeCun, Meta's chief AI scientist, have similar opinions.

LeCun said that although current AI has shown amazing capabilities in multiple fields, it still lacks four core characteristics of human intelligence: understanding the physical world, persistent memory, logical reasoning, and hierarchical planning.

Without these capabilities, AI cannot drive a car.

The solution, as everyone points out, is world models.

LeCun spoke about Meta's open-source world model V-JEPA 2, saying that with the help of world models, AI no longer needs millions of trainings to acquire a new capability. World models directly tell AI how the world works, which can greatly improve efficiency.

It sounds a bit like the "secret weapon" that Altman hasn't revealed regarding autonomous driving.

In practice, the "shovel sellers" of the AI era, NVIDIA, have already offered a new "shovel."

At CES 2025, NVIDIA CEO Jensen Huang said, "The ChatGPT moment for robots is coming. Like large language models, the world foundation model (World Model) is crucial for advancing the development of robots and autonomous vehicles."

NVIDIA's Cosmos world foundation model is specifically built for high-quality generation in physical interactions, industrial environments, and driving environments, capable of generating realistic videos and creating synthetic training data, helping robots and cars better understand the physical world.

Diagram of NVIDIA's Cosmos world model

It means that compared to Sora, Cosmos world foundation model is "real" in terms of simulation.

OpenAI must have already started working on this, expanding its AI portfolio to spatial intelligence.

In fact, OpenAI has had a robot team for a long time, but it was dissolved in 2021. In 2024, the robot team was re-established and further expanded in 2025, hiring many positions related to hardware robots.

Additionally, OpenAI has established a partnership with the robot startup Figure, providing AI model support for its humanoid robot.

The basic model of humanoid robots is very close to autonomous driving. During the exploration of world models, if OpenAI makes breakthroughs and applies them to autonomous driving, it is natural. After all, the autonomous driving market is undoubtedly a trillion-dollar market.

Even if Altman doesn't succeed in autonomous driving, blocking Musk in his proud field would be a satisfying revenge.

Original article: https://www.toutiao.com/article/7524191748034216500/

Disclaimer: This article represents the views of the author. Please express your opinion by clicking the [up/down] buttons below.