} ?>
(Yicai) July 2 -- When the internet’s data has been “drained dry,” automobiles will become a continuous source of physical world data for general artificial intelligence models, marking the critical transition to embodied intelligence, according to the co‑founder and chief scientist of Chinese AI startup SenseTime.
“In the mobile internet era dominated by smartphones, our large models relied only on internet data, but the value of this data has largely been exhausted, reaching a bottleneck,” Wang Xiaogang said in an interview with Yicai at SenseTime’s headquarters in Shanghai, identifying this as the core issue now facing AI development.
“The next stage requires generating new data through interaction with the physical world,” said Wang, who is also chief executive of SenseAuto, the company’s self-driving technology division. “Cars are an ideal platform because they are physical entities capable of generating new data through interactions with the real world, enabling new forms of intelligence.”
This concept underpins both SenseAuto’s commercial logic and Wang’s foresight into the AI era. “We are the most automotive‑focused AI company,” said Wang, who wore a black shirt emblazoned with ‘SenseAuto’ and the Chinese character for ‘jueying,’ which means fast enough that even shadows cannot catch up.
“We aim to be a steadfast partner to automakers, like a thoroughbred horse galloping alongside them,” he said.
Since SenseTime’s first collaboration with Japan’s Honda Motor in 2017 to today’s partnerships with more than 30 carmakers, covering more than 130 vehicle models and delivering in excess of 3.6 million vehicles, Wang’s team has seen and driven the evolution of Chinese AI companies in the smart car race.
Keeping an Edge
The smart driving sector is fiercely competitive, with players such as Horizon Robotics, DJI, Huawei Technologies, and Momenta Technology, alongside automakers such as BYD and Xpeng Motors, all vying for dominance. “We must always maintain our innovation and technical leadership,” Wang noted.
Take end‑to‑end autonomous driving as an example. SenseTime first proposed this concept as early as 2016. At that time, lidar -- short for light detection and ranging, a remote sensing method -- was prohibitively expensive, with most companies relying on rule-based systems. Wang’s team, in collaboration with Honda, proposed a vision-based end-to-end solution.
“We believed that cars should work like humans,” he said, “driving even without lane markings or high-precision maps. By inputting visual images and videos, the system directly outputs the vehicle’s future trajectory, aligning with today’s end-to-end autonomous driving solutions,” he explained. However, the absence of powerful networks like Transformers and limited computing power in vehicles and the cloud posed challenges at the time.
“Back then we told Honda cars should work like humans,” he said, “driving even where there are no lane markings and getting by without high-precision maps. By feeding in camera images and video, the system directly outputs the vehicle’s future trajectory, exactly the idea behind today’s end‑to‑end autonomous‑driving solutions.”
But at that time, they lacked powerful transformer networks and both in‑car and cloud computing capabilities were insufficient.
Five years later, SenseTime led the industry again by unveiling UniAD, the sector’s first integrated perception‑ and decision‑making model for autonomous driving. UniAD earned the Best Paper Award at the International Conference on Computer Vision and Pattern Recognition in 2023, marking the first time that self-driving tech research had received the honor. UniAD is due to be supplied to Dongfeng Motor in the fourth quarter of this year.
US electric carmaker Tesla did not announce that its vehicles would use an end-to-end approach until September 2023.
The problem with end-to-end systems, Wang pointed out, is that they rely heavily on data and require a huge amount of high-quality data for training. Yet, only a small fraction -- possibly as little as 1 percent -- of the data collected from human driving is useful for model training, and filtering it remains highly subjective, he said.
To address the paucity of data in extreme end-to-end driving scenarios, Wang’s team developed a world model (a large-scale model). Much like DeepSeek overcame language model data scarcity through reinforcement learning to generate previously unavailable knowledge bases, SenseAuto applied the world model for mass-produced autonomous driving systems.
“The key to world models is precision and control,” Wang explained. “Input a vehicle’s 3D trajectory and it generates the corresponding video.” This breakthrough overcomes the safety and coverage limitations of traditional data collection, enabling the generation of training data for hazardous scenarios.
Last November, SenseAuto launched the Kaiwu world model, which was upgraded to a near real-time interactive 4D version this April.
Still, industry opinion varies on the commercialization of cutting-edge technologies such as world models. Some experts argue that while the concepts are advanced, their transition from the laboratory to mass-produced vehicles still needs time to verify.
From Passive Tool to Proactive Partner
“Intelligence will define the second half of the electric vehicle race, with the competition hinging on differentiation,” Wang said, demonstrating a clear grasp of industry trends.
He believes smart driving tech is in a “golden harvest period,” evolving from rule-based systems to end-to-end solutions, from reliance on high-precision maps to mapless navigation, and from vision-language models to world model-based reinforcement learning. This rapid iteration of technical approaches offers a significant window of opportunity for tech companies.
Equally important is Wang’s insight into the nature of AI applications. “In the past, AI in cars was passively triggered, but in the future it will shift to ambient computing -- AI that is ubiquitous human-centric, and proactively delivers services,” he said.
This vision first materialized in SenseTime’s ‘A New Member for U’ concept, which sparked a chain reaction in the industry two months later, with Li Auto launching its ‘Silicon-Based Family’ and Geely Holding Group and others following suit.
This is underpinned by SenseTime’s unique ‘1+X’ model, where ‘1’ represents robust computing infrastructure and large-scale model capabilities, and the ‘X’ signifies deep applications across various verticals.
Wang predicts that “cloud infrastructure demands will continue to grow, while in-vehicle systems will simplify as models take over. However, data generation and testing analysis capabilities will be key -- and that’s SenseTime’s strength.”
Wang’s confidence stems from early investments. In 2018, when domestic cloud services could not yet provide large graphics processing unit clusters, SenseTime built an AI supercomputing facility in Shanghai’s Lingang Special Area, now ranked among China’s top three.
When ChatGPT ignited the large language model boom in 2022, SenseTime’s infrastructure advantage became evident. Its accumulated expertise in generative AI, such as image and video generation since 2022, laid the foundation for today’s intelligent driving applications.
In March 2025, SenseAuto and automaker Guangzhou Automobile Group jointly launched the first Horizon Robotics Journey 6M assisted driving solution, with more Journey 6-based solutions set to roll out with Chery Automobile and other car manufacturers.
From Cars to Embodied Intelligence
“Cars are not the endgame but a critical stepping stone to a broader vision,” Wang hinted out, outlining a grander ambition. “The path of AI development runs from the mobile internet to cars, and then to robots.”
This judgment is based on a deep understanding of data evolution. “In the mobile internet era, large models relied on internet data, but that value has been exhausted, hitting a ceiling. The next phase demands new data from physical world interactions.”
“In the mobile internet era dominated by smartphones, our large models relied only on internet data, but the value of this data has largely been exhausted, reaching a bottleneck,” Wang said. “The next stage requires generating new data through interaction with the physical world.”
Cars are an ideal platform for this transition. As physical entities operating in the real world, they generate interaction data unattainable in the internet era. “With tens of millions of vehicles, this scale provides a strong foundation for us to step into the next era of embodied intelligence,” Wang said.
But the leap from cars to robots will be challenging due to varying levels of standardization. “Robots come in all shapes -- robotic dogs, wheeled bots, legged bots -- with no uniform cameras,” Wang noted. This demands greater versatility, generalization, and world model simulation capabilities.
SenseTime is already laying the groundwork for embodied intelligence. “We’re not just collaborating,” Wang noted, “we’re investing in upstream and downstream companies. Our work in automotive today is also preparation for entering robotic intelligence in the future. It’s all interconnected.”
In Wang’s view, SenseTime’s positioning as an AI platform company is key to its leadership through multiple waves of AI transformation. “Early on, many advised us to focus on a single industry for easier monetization, but we didn’t take that path,” he said.
This strategic choice reflects a deep understanding of AI’s evolution. “AI is driven by different industries at different times,” Wang said. SenseTime’s journey validates this judgment: from image intelligence to edge computing, to cars, and now toward embodied intelligence. “We’ve followed this trajectory, and it may well reflect the inherent law of AI’s development,” Wang concluded.
Editor: Tom Litting