China-Developed Large AI Models Are Still Weak in Complex Reasoning, Shanghai AI Lab Says
Liu Xiaojie
DATE:  Jan 31 2024
/ SOURCE:  Yicai
China-Developed Large AI Models Are Still Weak in Complex Reasoning, Shanghai AI Lab Says China-Developed Large AI Models Are Still Weak in Complex Reasoning, Shanghai AI Lab Says

(Yicai) Jan. 31 -- Large artificial intelligence language models developed by Chinese tech firms still lag behind US AI firm Open AI’s GPT-4 Turbo in complex reasoning but are competitive in terms of knowledge base and language capabilities, especially in Chinese, according to a recent study.

Chinese chatbots, such as Zhipu AI’s GLM-4, Alibaba Group Holding’s Qwen-Max and Baidu’s Ernie Bot 4.0, scored just below GPT-4 Turbo in a large AI model evaluation carried out by the Shanghai AI Laboratory, which released the latest version of its open-source evaluation system OpenCompass 2.0 yesterday.

But even with a small gap, it does not mean that they have the same abilities as GPT-4 Turbo, Chen Kai, a scientist at the lab, told Yicai. The scores comprise many aspects, and while China-developed large language models are close to GPT-4 Turbo in terms of knowledge base and language capabilities, they still have a long way to go to catch up in reasoning ability.

And even GPT-4 Turbo only scored 61.8 points out of 100, just above the pass rate, indicating that there is still a lot of room for chatbots to improve, the lab said, adding that the study did not include all large AI model developers, and more new models will be evaluated next time.

The ability to carry out complex reasoning determines how reliable a large AI model is, said Lin Dahua, a scientist at the lab. For instance, it must not make mistakes in finance. When used to analyze a company’s financial statements or industrial technical documents, if a chatbot’s mathematical calculation and analysis capabilities are inadequate, this will become a technical barrier.

“Many China-developed chatbots are only used in customer service and for chatting. Talking nonsense when chatting does not have an adverse impact, but such large models cannot be applied in serious business situations,” Lin said.

The Shanghai AI Lab first launched OpenCompass in July last year and it is one of four large AI model evaluation tools recommended by US tech giant Meta and the only one developed by a Chinese firm.

Editors: Tang Shihua, Kim Taylor

Follow Yicai Global on
Keywords:   GPT-4 Turbo,OpenCompass2.0,Shanghai Artificial Intelligence Laboratory,LLM