Skip to content

OCR (optical character recognition)

The ocr Module

🇨🇴 Versión en español de este documento

The ocr module takes an image as input and returns any text found within that image in a JSON file.

This overview of the ocr module is divided into the following sections:

Inputs and Outputs of the ocr Module

The ocr module accepts image inputs. Acceptable file formats are the following:

  • JPG

  • JPEG

  • PNG

The ocr module returns a JSON file. The JSON file holds all identified text and the pixel coordinates on the image for each chunk of identified text.

Available Models in the ocr Module

You can activate any of the following models when using the ocr module:

Use the modules argument in the process method to determine what model you'd like active when you process files through the ocr module.

Model Parameters in the ocr Module

None of the ocr module models are parameterizable. Consequently, when selecting what model you'll use through the process method's modules argument, params will always be set to an empty dictionary. For example:

# example model selection for ocr module in .process
modules={'ocr': {'model':'tesseract-es',
                 'params': {}}}

Input File Size Limit

ocr module input image files can currently be no larger than 5MB.

A Single-Module Pipeline for the ocr Module

Please click here to visit the Pipeline Examples section of our documentation and review an example of a single-module pipeline for the ocr module.

Further Information on ocr Module IO and Clickability

Please click here to visit the Convenience Methods (and More!) documentation. There you will find two tools to learn more about the ocr module: