(By Chen Jishen, Editor: Zhang Guangkai)
When the AI industry has been discussing agents and AI employees since the start of the year, DeepSeek quietly released a seemingly "boring" update on January 27th — DeepSeek-OCR2.

As the latest update three months after the release of DeepSeek-OCR1 on October 20, 2025, although it may not be as expected as the rumored V4, the announcement of DeepSeek-OCR2 may have directly sounded the death knell for the billion-dollar OCR (document recognition) market.
Over the past decade, OCR has been a secret and profitable business. From Adobe's PDF editor to the membership fees of CamScanner, and the expensive API calls from Amazon AWS Textract, countless companies have made substantial profits by "teaching machines to read."
Taking the parent company of CamScanner, Hexie Information, as an example, its financial reports show that the company has maintained a gross profit margin of around 85% for years. But overnight, DeepSeek told the market: reading images and recognizing text doesn't need to be so expensive.
From mechanical scanning to intelligent reading
The core innovation of DeepSeek-OCR2 lies in introducing a new encoder structure called DeepEncoder-V2, which can dynamically adjust the processing order of visual information based on image semantics, allowing the model to intelligently sort visual content before performing text recognition.
Traditional OCR is like a "diligent but rigid scribe," typically scanning images in a mechanical left-to-right, top-to-bottom order.
This mode's weakness lies in "lack of logic." When encountering newspaper column layouts, it would merge two unrelated articles together; when facing distorted invoices, it couldn't find alignment lines; and for dense small-font financial reports, it could only see blurred text.
DeepSeek-OCR2 introduced the concept of "visual causal flow." In DeepEncoder-V2, the research team replaced the original CLIP-based visual encoding module with a language model-like structure and introduced learnable "causal flow query tokens" within the encoder.
The encoder contains both bidirectional attention and causal attention processing modes. Original visual information is processed globally through bidirectional attention, while the newly added query tokens build semantic sequences step by step through causal attention.
This is equivalent to giving AI an "adaptive microscope." It no longer violently compresses images but dynamically slices them based on content density. Where there are more characters, it looks closely; where there is blank space, it skips over.
A revolution in understanding
Test results on the OmniDocBench v1.5 benchmark showed that DeepSeek-OCR2 achieved a score of 91.09% under lower visual token limits, a 3.73% improvement over DeepSeek-OCR. Particularly in reading order accuracy, the edit distance dropped from 0.085 to 0.057.
But performance improvements are just the surface; the truly revolutionary aspect is its underlying understanding ability.
DeepSeek-OCR2 does not simply convert images into text but outputs Markdown or JSON format directly. It sees not lines and ink but "key-value pairs."
This means that the work of hiring engineers to write numerous regular expressions to clean data has suddenly lost its value. More importantly, it has built-in quality control. If you give it a greasy supermarket receipt, where the "total" is obscured, traditional OCR would honestly output a mess of garbled text. DeepSeek would read all the unit prices and quantities, do the math in its mind, and then reason: "Although this part is unclear, according to the calculation logic, the total should be 108.5 yuan."
This built-in logical verification capability is the "holy grail" that banks and insurance companies long for in account statement reviews and insurance claims entry.
At the same time, human commercial documents are full of subtleties: bold text means emphasis, red color means loss, and arrows mean process flow. Traditional OCR would lose these pieces of information, but DeepSeek can preserve these "emotions and highlights."
The future AI analyst will not only be able to read numbers in financial reports but also understand bad news that management tries to hide through formatting.
A 200-time price gap's dimensional strike
Aside from performance improvements, DeepSeek once again gave the OCR industry a price shock.
According to AWS official pricing, using Textract's Analyze Document API to process tables costs $0.015 per page (for the first 1 million pages), and $0.010 per page after exceeding 1 million pages. Using the Custom Queries feature costs as high as $0.025 per page (first 1 million pages), and $0.015 per page after exceeding 1 million pages. Using the combination of Pretrained Forms and Custom Queries costs $0.065 per page (first 1 million pages).
This means that processing 1,000 pages of complex financial documents using AWS Textract would cost approximately $65 (about RMB 470).
While DeepSeek's Token billing model costs about $0.28 (approximately RMB 2) to process the same amount of information. If it hits the cache, the cost can even drop to $0.028. From $65 to $0.28, this is a cost difference of more than 200 times.
In any commercial competition, when the challenger's cost is only 1/200 of yours, your previously proud "exclusive algorithms" and "private datasets" become meaningless.
Who is trembling, who is celebrating
The emergence of DeepSeek-OCR2 directly caused the narrative logic of traditional OCR vendors such as Hexie Information, Hanwang Technology, and ABBYY — "we have accumulated ten years of ticket templates, and large models cannot handle these long-tail scenarios" — to collapse outright.
However, the impact of DeepSeek on different types of OCR vendors is evident.
Hexie Information's C-end products mainly include CamScanner, Business Card Scanner, and Qixinbao apps, while its B-end products mainly provide smart text recognition and commercial big data products and services for various industries. When DeepSeek proves that large models can not only do it, but also do it better without specific training, when the generalization ability of general models covers the professional capabilities of vertical models, these companies' technical barriers disappear, leaving only fragile customer relationships.
Adobe Acrobat, as the king of the PDF era, is based on the logic of "editing." In the AI era, users don't need to "edit" PDFs, they need to "restructure" the content. If DeepSeek can directly read PDFs and perfectly convert them into editable Word documents, or even extract data directly into databases, then the tool "PDF editor" itself would lose its meaning.
AWS Textract charges from $0.0015 per page for basic text detection, up to $0.015 per page for table extraction, and up to $0.05 per page for form processing. Cloud providers are used to packaging each function into expensive APIs for sale. DeepSeek's open-source strategy makes enterprises realize that they don't need to pay this "toll fee."
Developers can deploy an open-source DeepSeek model locally, protecting privacy while saving a huge budget. However, for the broader commercial world, when machines "read" books is no longer expensive, new opportunities are emerging.
Previously unattainable micro-enterprise credit services due to high OCR costs have now become feasible; large-scale exam grading and learning material digitization have become a reality; automated processing and analysis of medical records and test reports have become widespread; and intelligent upgrades in contract review and case retrieval will accelerate.
The victory of the open-source ecosystem
Notably, DeepSeek-OCR2 uses Alibaba's lightweight Qwen2-0.5b model instead of one of the key components in the architecture, highlighting the growing importance of China's open-source ecosystem in promoting the development of artificial intelligence.
DeepSeek believes that this provides a promising path towards a unified multimodal encoder. In the future, a single encoder may achieve feature extraction and compression of images, audio, and text within the same parameter space by configuring specific modalities with learnable queries.
This open-source collaboration model not only accelerates technological iteration, quickly integrating the technical achievements of different teams; more importantly, it significantly reduces costs, avoids redundant efforts, and allows R&D costs to be shared; ultimately leading to the prosperity of the entire ecosystem, enabling more developers to build applications based on open-source models.
The release of DeepSeek-OCR2 is not just a technological news. It marks the end of the historical mission of OCR, a technology that has accompanied the computer industry for decades — from a "service" that needed to be purchased at a high price, to an "infrastructure" like water, electricity, and gas.
According to DeepSeek's technical report, the model maintains high precision while strictly controlling computational costs, with the number of visual tokens limited between 256 and 1120. This extreme efficiency optimization is a typical characteristic of infrastructure.
For Adobe and Hexie Information, winter has arrived; but for the broader commercial world, when machines "read" books is no longer expensive, the vast amounts of data assets stored in paper, PDFs, and images are finally waking up.
DeepSeek did not kill a single company, but rather the old era where "acquiring information requires high costs."
In this era of AI reshaping everything, any business model built on information asymmetry and technical barriers will face a dimensional strike from the open-source world. And this, perhaps, is just the beginning.
Original article: toutiao.com/article/7600345843845186048/
Statement: The article represents the views of the author.