Raw OCR Model Overview
Automatically extract the raw text from each page of a document using OCR.
Last updated
Was this helpful?
Automatically extract the raw text from each page of a document using OCR.
When the entire text of a document needs to be extracted, along with position information for each word.
We use the term "OCR" literally, as the acronym for "Optical Character Recognition".
Meaning the exact text is returned as raw data, not parsed into structured data as fields.
Most users looking for a generic "Mindee OCR" or to "OCR a document" are likely looking for Extraction: consult the Extraction model documentation.
A file sent to the Raw OCR Model may have any number of pages, within limits.
Almost all languages are fully supported, since only the writing system is involved in detection.
In other words, as long as the system can recognize the glyphs (letters in an alphabet), the text will be extracted.
It's much easier and shorter to list what is not supported:
Ancient languages with no equivalent modern writing system such as cuneiform, Egyptian hieroglyphs, ancient Maya, etc.
Modern languages with uncommon writing systems such as Blackfoot, Cherokee, Inuktitut, etc.
Raw OCR models are always custom, there are no templates available in the Catalog.
Each OCR model gets its own unique model ID when you create it.
To create a OCR model, click on Models, and then click on Create your document AI model.
Scroll to the Document Utilities section, click on OCR. This step will also generate the model's unique ID.
You can now use the Live Test tab to process documents.
Your OCR model is now available in your Models tab:

Here is a step-by-step tutorial that shows you how to properly create an OCR utility:
Once your OCR model is created and tested, integration documentation is provided in the "Documentation" page, or here: OCR Quick Start
Last updated
Was this helpful?
Was this helpful?

