} ?>
(Yicai) Oct. 21 -- Chinese artificial intelligence firm DeepSeek has launched DeepSeek-OCR, an open-source model that uses optical compression to extract and compress text from images and PDFs, providing vast, high-quality training datasets for large language and vision language models while requiring significantly less computing power.
DeepSeek-OCR, which was released on Github yesterday, is an optical compression method based on VLMs, designed to address the computational challenges faced by LLMs when processing long textual content, according to the paper DeepSeek-OCR: Contexts Optical Compression, which was published the same day.
This method significantly reduces the required number of text tokens by compressing textual information into visual representations and storing them in an optical format, the report said. A single A100-40G GPU can generate over 200,000 pages of training data per day.
DeepSeek-OCR achieves over 96 percent accuracy with a 10-fold reduction in data, 90 percent accuracy at a compression rate of between 10 and 12 times, and around 60 percent accuracy with a 20-fold reduction in data. This indicates that compact language models can effectively learn to decode compressed visual representations, making it possible for larger models to acquire similar capabilities.
DeepSeek-OCR can compress long-form content, such as converting dialogue history into images, thereby improving the ability of LLMs to process massive documents, such as research papers, legal contracts and financial reports.
By rendering text into images and compressing them, the model can simulate the way the human memory forgets, achieving a gradual erasing of textual information and thereby improving LLM efficiency.
The OCR model garnered over 1,400 stars on GitHub shortly after its release. However, DeepSeek has been slow to release new models like R2, leading some in the field to believe it is falling behind. Another perspective suggests that DeepSeek is currently focusing on building its internal capabilities, gathering strength for its next-generation model.
Editor: Kim Taylor