} ?>
(Yicai) Oct. 21 -- ShengShu Technology, a Chinese multimodal generative artificial intelligence startup, has launched the new versions of its reference-to-video tool, improving consistency and creative control to boost power and rival global giants OpenAI’s Sora 2 and Google’s Veo 3.1.
The new features of Vidu Q2 allow creators to upload and blend up to seven reference images for faces, scenes, or props into a single, unified video, ShengShu announced today. It blends these elements with a text prompt using its multiple-entity consistency feature, keeping each part distinct and true to the original, it added.
Vidu Q2 also supports generating transition animations by uploading only the first and last frames, a powerful feature for narrative control also available on Veo 3.1. In addition, ShengShu released the Vidu Q2 application programming interface, enabling businesses to integrate the new features into their workflows.
“Vidu Q2 marks a new chapter in AI video creation,” said Luo Yihang, chief executive of Beijing-based ShengShu. “We’re moving into a time where AI can mimic human looks and express emotions with cinematic flair. This launch goes beyond basic video creation; it’s about teaching AI to act and tell stories alongside creators.
“With each release, we blend technology and creativity more closely,” Luo noted. “Our goal isn’t to replace creativity but to expand it, making imagination visible and emotions limitless.”
ShengShu's Vidu Q2 generates content faster and at a more affordable price compared with the high costs associated with Sora 2 and Veo 3.1, industry insiders pointed out.
Yicai tested the new features, prompting Vidu Q2 to create a video of a specific blade battery module moving on a conveyor in a Chinese electric vehicle factory while being scanned by a Siasun yellow industrial robot, with a screen displaying real-time yield: 99.92 written in simplified Chinese in the background.
Vidu Q2 successfully fused the battery, the robot arm, the Siasun logo, and the Chinese text into a single dynamic video, maintaining high stability and accuracy with the Chinese characters, validating the "multi-entity consistency" claim of ShengShu.
Veo 3.1, which supports up to three reference images, failed to display the Chinese text correctly when tested with the same prompts, while Sora 2 correctly rendered the text but changed the logo to that of Nissan Motor.
In another test, Yicai prompted Vidu Q2 to create a scene where a chairman asks angrily in Chinese, “The battery caught fire, are you messing with me?” to which a US CEO replies in English, “Not me, it is them," while in a meeting room in Shanghai.
The tool used reference images to generate the angry facial expression, with accurate lip-sync for both languages, but the audio's emotional tone was flat and lagged behind Veo 3.1. The second test still showed that Vidu Q2's capabilities in handling multi-lingual dialogue and emotions are competitive with its global rivals.
Established in March 2023 by a team from Tsinghua University's Institute for AI Industry Research, Shengshu launched Vidu 1.0 in April last year and has attracted 30 million users across more than 200 countries, generating more than 400 million videos. The tool can generate five and eight-second videos at 1080 pixels from text prompts in Chinese or English and from images.
Editor: Martin Kadiev