OCR (optical character recognition)
The ocr Module
🇨🇴 Versión en español de este documento
The ocr module takes an image as input and returns any text found within that image in a JSON file.
This overview of the ocr module is divided into the following sections:
- Inputs and Outputs of the
ocrModule - Available Models in the
ocrModule - Model Parameters in the
ocrModule - Input File Size Limit
- A Single-Module Pipeline for the
ocrModule - Further Information on
ocrModule IO and Clickability
Inputs and Outputs of the ocr Module
The ocr module accepts image inputs. Acceptable file formats are the following:
-
JPG
-
JPEG
-
PNG
The ocr module returns a JSON file. The JSON file holds all identified text and the pixel coordinates on the image for each chunk of identified text.
Available Models in the ocr Module
You can activate any of the following models when using the ocr module:
-
tesseract-en - (default) [English]
-
tesseract-es - [Spanish]
Use the modules argument in the process method to determine what model you'd like active when you process files through the ocr module.
Model Parameters in the ocr Module
None of the ocr module models are parameterizable. Consequently, when selecting what model you'll use through the process method's modules argument, params will always be set to an empty dictionary. For example:
# example model selection for ocr module in .process
modules={'ocr': {'model':'tesseract-es',
'params': {}}}
Input File Size Limit
ocr module input image files can currently be no larger than 5MB.
A Single-Module Pipeline for the ocr Module
Please click here to visit the Pipeline Examples section of our documentation and review an example of a single-module pipeline for the ocr module.
Further Information on ocr Module IO and Clickability
Please click here to visit the Convenience Methods (and More!) documentation. There you will find two tools to learn more about the ocr module: