COMPUTER INDUSTRY WEEKLY: ALI LATE-NIGHT OPEN SOURCE QWEN2.5-OMNI DEEPSEEK-V3 LAUNCHED A NEW VERSION

DATE: Apr 04 2025

Computing power: The rental price of computing power is stable, Ali open-source

Qwen2.5-Omni late at night

In the early morning of March 27, Ali Tongyi Qianwen team released Qwen2.5-Omni.

This is the new flagship multimodal model in the Qwen series, designed for comprehensive multimodal awareness, which can seamlessly process a wide range of inputs including text, images, audio, and video, while supporting streaming text generation and natural speech synthesis output.

The team proposed the Thinker-Talker architecture, an end-to-end multimodal model designed to perceive multiple modalities including text, images, audio, and video, while generating text and natural speech responses in a streaming fashion. In addition, the team proposed a new type of positional embedding called TMRoPE (Time-aligned MultimodalRoPE) to synchronize the timestamp of video input with audio. For real-time voice and video chat, the architecture is designed for fully real-time interactions, supporting chunked input and instant output. It also has natural and robust speech generation: When it comes to voice generation, Qwen2.5-Omni outperforms many existing streaming and non-streaming alternatives, demonstrating superior robustness and naturalness.

Robust multimodal performance: When benchmarked against a single-modal model of the same size, the Qwen2.5-Omni demonstrated excellent performance across all modalities.

The Qwen2.5-Omni surpasses the similarly sized Qwen2-Audio in terms of audio capabilities, and achieves performance comparable to the Qwen2.5-VL-7B. Excellent end-to-end voice command adherence: Qwen2.5-Omni is comparable to the effectiveness of text input in end-to-end voice command adherence, as demonstrated in benchmarks such as MMLU and GSM8K.

Thinker is like a brain for Qwen2.5-Omni, processing and understanding input from text, audio, and video modalities, generating high-level representations and corresponding text. Talker is like a human mouth, receiving high-level representations and text generated by Thinker in a streaming manner, and smoothly outputting discrete speech tokens.

Thinker is a Transformer decoder equipped with encoders for audio and images to facilitate the extraction of information. In contrast, Talker is designed as a dual-track autoregressive Transformer decoder architecture.

▌AI applications: Gemini search visits +6.85% QoQ, DeepSeek-V3 launches new version

DeepSeek-V3-0324 launches new version of DeepSeek-V3-0324, with 685 billion parameters, a small increase from the previous version (671 billion). One of the highlights of DeepSeek V3 is that it adopts the MIT open source license, and the previous version is a custom license, which can not only freely modify and distribute the model, but also support model distillation and commercial application.

DeepSeek-V3-0324 skyrocketed performance across all metrics, beating Claude 3.5Sonnet to become the strongest non-inference model at present. In terms of code capabilities, DeepSeek-V3-0324 can also compete with Claude 3.5 Sonnet. In addition, DeepSeek-V3-0324 scored 55% in Aider's multilingual benchmark, a significant improvement over its predecessor, ranking second only to Sonnet 3.7 among non-inference models. Its performance is comparable to that of models with inference capabilities, such as R1 and o3-mini.

In the KCORES large model arena, Claude-3.7-Sonnet-Thinking is undoubtedly the undisputed king of LLMs, with DeepSeek-V3-0324 taking third place with 328.3 points, behind Claude 3.5 Sonnet. In the mandelbrotset-meet-libai test, DeepSeek-V3-0324 did not change much, and was only 2 points lower than the initial version, with a high improvement in completion. In the Mars mission test, the DeepSeek-V3-0324 planet was rendered correctly, ranking third among all models. Nine planetary tests, DeepSeek-V3-0324 truly maps the complete solar system. In addition, DeepSeek-V3-0324 topped the list of non-inference models on the MisguidedAttention benchmark, even surpassing Claude Sonnet 3.7 (non-inference models).

▌AI financing trends: The two siblings once again amazed the AI circle and raised 25 billion

yuan again Recently, Anthropic announced the completion of the E round of financing of 3.5 billion US dollars (about 25 billion yuan), with a post-investment valuation of 61.5 billion US dollars (about 445 billion yuan), which is less than 10 days after Anthropic's last round of financing of 1 billion US dollars.

Dario Amoudi and Daniela Amoudi, siblings who left OpenAI in 2021 to start their own business, are in an unprecedented race to raise funds for the AI upstarts they represented. In just 3 months in 2025, from OpenAI to xAI, to Anthropic, and new AI companies founded by the former CTO of OpenAI, they are raising funds like crazy.

With this financing, Anthropic will advance the development of its next-generation AI systems, expand its computing power, deepen its research in mechanics and consistency, and accelerate its international expansion.

Born from former OpenAI earliest employees Dario Amoudi and sister Daniela Amoudi, Anthropic's funding path is a breakthrough. Just two months ago, it received a $1 billion investment from Google, which at one point grew to a valuation of $60 billion, on the condition that it must use Google's cloud services, and before that, Anthropic had signed similar terms with Amazon.

In 2025, just a quarter of the way through, Anthropic has earned more than $4.5 billion. Looking

back, since its inception in February 2021, Anthropic has successively completed more than 10 rounds of financing, and its valuation has soared to US$61.5 billion (about 440 billion yuan), making it one of the fastest-growing AI companies.

▌Investment Advice The

new version of DeepSeek-V3-0324 has significantly improved code capabilities, enhanced mathematical and logical reasoning capabilities, and further improved model architecture and open source ecosystem, highlighting the competitiveness of Chinese AI companies in terms of technology and cost, and more importantly, its performance jump indicates that the team may be paving the way for subsequent major releases. Now, with the new version of DeepSeek-V3, AI applications will accelerate their penetration in vertical areas. It is recommended to pay attention to the successful implementation and verification of clinical AI products (688246. SH), iFLYTEK (002230. SZ), Cambrian (688256. SH), the high-speed communication connector business may significantly benefit from the GB200 volume of Dingtong Technology (688668. SH), Emdoor Information (001314. SZ), the precision parts leader Maixinlin (688685.SH) to accelerate the expansion of the computing power business, and Honglin Power (301439. SZ), the new energy business has increased and supplied global motor giants such as Kollmorgen (301196. SZ) and so on.

Risk Warning:

1) The iteration speed of the underlying AI technology is not as fast as expected. 2) Policy supervision and copyright risks. 3) The implementation effect of AI applications is not as expected. 4) Recommend the risk that the company's performance is less than expected.

Follow Yicai Global on

star50stocks

Ticker Name

Percentage Change

Inclusion Date

star50

star50stocks

Log in to Yicai Global

EMAIL

PASSWORD

Create your account

EMAIL

We sent you a code

VERIFICATION CODE

You'll need a password

PASSWORD

Find your Yicai Global account

Enter your email

Check your email

Enter code

Change your password

Enter your new password

Enter your new password again

Reset your password

Enter your new password

star50

star50stocks

Log in to Yicai Global

EMAIL

PASSWORD

Create your account

EMAIL

We sent you a code

VERIFICATION CODE

You'll need a password

PASSWORD

Find your Yicai Global account

Enter your email

Check your email

Enter code

Change your password

Enter your new password

Enter your new password again

Reset your password

Enter your new password

getcode