Reference Message Network reported on April 7 that according to a report on the website of Hong Kong's South China Morning Post on April 6, as the public awaits the release of the next generation model by the Chinese artificial intelligence startup DeepSeek, the company has introduced a new method to enhance the reasoning ability of large language models (LLMs).

A paper recently published shows that DeepSeek collaborated with researchers from Tsinghua University to develop a technology that combines "generative reward modeling" (GRM) and "self-principle review tuning". This dual approach aims to enable LLMs to better and faster answer general query questions.

The researchers wrote that the resulting DeepSeek-GRM model outperforms existing methods, achieving competitive performance with a powerful public reward model. Reward modeling is a process that guides large language models toward human preferences.

The researchers stated that DeepSeek plans to open-source the GRM model but did not provide a timeline.

In the meantime, due to the global attention received by DeepSeek's V3 foundational model and R1 reasoning model, there are many speculations about DeepSeek's next steps. Reuters previously reported that DeepSeek-R2 will soon be released. The release of DeepSeek-R1 shocked the global tech community with its highly cost-effective performance, which can rival leading models.

DeepSeek has remained tight-lipped about the rumored R2 release.

Last month, the Hangzhou-based DeepSeek upgraded its V3 model (named DeepSeek-V3-0324), stating that it provides stronger reasoning capabilities, optimized front-end web development, and enhanced Chinese writing abilities. (Translated by Zhu Jie)

Original article: https://www.toutiao.com/article/7490400805502091785/

Disclaimer: This article solely represents the author's views. Please express your opinions by clicking the "Like/Dislike" buttons below.