Debang Securities: Model distillation technology exploded in the year to accelerate AI equality

DATE: Feb 07 2025

Debang Securities released a research report saying that from DeepSeekR1 to s1, it has been proved that 2025 will be the starting point for the inclusion of large models, and AI applications and end-side may usher in a comprehensive strengthening of AI cost reduction and model capability improvement at the same time; In addition, with the support of distillation technology, the Jevons paradox may be continuously demonstrated, and it is expected that more phenomenal cost-effective small models will emerge, which will be deployed on the device side and applications at the same time, and promote the gradual transformation of the model from pre-training to inference, and domestic computing power is expected to usher in a value revaluation under the explosion of inference computing power.

The main views of Debang Securities are as follows:

The cost is only $50, and the performance is comparable to that of O1 and R1, and the model distillation technology has exploded in years.

According to a new research paper by AI researchers at Stanford University and the University of Washington, including TechCrunch, Feifei Li, and others, they have successfully trained an AI inference model, s1, for less than $50 (only for cloud computing services, not including hardware investment costs such as servers and graphics cards).

1) Technology path: The paper points out that inference models can be distilled through a relatively small dataset and a supervised fine-tuning (SFT) process, in which the AI model is explicitly instructed to mimic certain behaviors in the dataset. Specifically, the team built the "s1K" dataset, which consists of 1,000 carefully selected questions, each with reasoning traces and answers distilled from the Gemini Thinking Experimenta. The team then performed supervised fine-tuning (SFT) on a pre-trained model and trained for 26 minutes using only 16 H100GPU. In addition, in order to improve the accuracy of the answers, the research team also used a "budget coercion" technique, which can control the test time calculation, and optimize performance by forcing the model's thinking process to be terminated early, or by adding multiple "wait" instructions during S1 inference to prolong the thinking.

2) Test results: According to the test results of the research team, s1-32B performed 27% higher than o1-preview in competition mathematics problems (MATH and AIME24); The performance of the model on AIME24 is almost comparable to that of the Gemini 2.0 Thinking API, showing that the distillation process is effective.

Low cost, open source, and distillation will greatly lower the threshold for AI model development, which is expected to accelerate the process of AI equality

According to Geek Park, as early as January 2025, DeepSeek released the official version of the inference model DeepSeek-R1, which uses the MIT protocol, synchronizes open-source model weights, and allows users to use model output, model distillation, etc. to train other models. Through the output of DeepSeek-R1, 6 small models were distilled to the community, of which the 32B and 70B models achieved the effect of benchmarking Open AI o1-mini in many capabilities.

Combined with the ultra-low cost that Li Feifei's team trained s1 from the data distilled from the Gemini Thinking Experimenta model, it also achieved excellent model performance, which not only confirms that distillation technology is an important means to promote model miniaturization and commercialization, but is also expected to narrow the performance gap between open source and closed-source models, thereby accelerating the process of AI equalization. It also lays the foundation for the explosion of AI applications and end-side.

Investment Advice:

1) Model distillation: Zhixin Precision (301512.SZ), Tors (300229.SZ), Si Teqi (300608.SZ), Dinecke (300884.SZ), Geling Shentong (688207.SH), Shenzhou Taiyue (300002.SZ), etc.

　　2) AI applications: Kingsoft Office (688111.SH), Pan Micro Network (603039.SH), Zhiyuan Internet (688369.SH), Borui Data (688229.SH), Zhongke Xingtu (688568.SH), Kingdee International (00268), Foxit Software (688095.SH), Color News (300634.SZ), Wondershare Technology (300624.SZ), Eclickworld (301171.SZ), Aerospace Hongtu ( 688066.SH) etc.;

　　3) AI device: Yuntian Lifei-U (688343.SH), Shiyun Circuit (603920.SH), Lenovo Group (00992), iFLYTEK (002230.SZ), Espressif Technology (688018.SH), Zhongke Lanxun (688332.SH), etc.

　　4) AI computing power: Yuntian Lifei-U (688343.SH), Huafeng Technology (688629.SH), Haiguang Information (688041.SH), Sugon (603019.SH), Cambrian-U (688256.SH), Digital China (000034.SZ), Inspur Information (000977.SZ), Runze Technology (300442.SZ), Runjian (002929.SZ), VNET. US) and so on.

Risk warning: the upstream supply is less than expected, the downstream AI industry is not as fast as expected, the midstream competition pattern is intensifying, the international situation risk, and the macro interest rate risk at home and abroad.

Follow Yicai Global on

star50stocks

Ticker Name

Percentage Change

Inclusion Date

star50

star50stocks

Log in to Yicai Global

EMAIL

PASSWORD

Create your account

EMAIL

We sent you a code

VERIFICATION CODE

You'll need a password

PASSWORD

Find your Yicai Global account

Enter your email

Check your email

Enter code

Change your password

Enter your new password

Enter your new password again

Reset your password

Enter your new password

star50

star50stocks

Log in to Yicai Global

EMAIL

PASSWORD

Create your account

EMAIL

We sent you a code

VERIFICATION CODE

You'll need a password

PASSWORD

Find your Yicai Global account

Enter your email

Check your email

Enter code

Change your password

Enter your new password

Enter your new password again

Reset your password

Enter your new password

getcode