Skip to content

image caption

The caption Module

🇨🇴 Versión en español de este documento

The caption module takes an image as input and returns a natural language text description of that image.

This overview of the caption module is divided into the following sections:

Inputs and Outputs of the caption Module

The caption module accepts image inputs. Acceptable file formats are the following:

  • JPG

  • JPEG

  • PNG

The caption module returns a JSON file. At the core of the returned JSON file lies a dictionary that in turn contains the newly generated image caption.

Available Models in the caption Module

You can activate any of the following models when using the caption module:

Use the modules argument in the process method to determine what model you'd like active when you process files through the caption module.

Model Parameters in the caption Module

None of the caption module models are parameterizable. Consequently, when selecting what model you'll use through the process method's modules argument, params will always be set to an empty dictionary. For example:

# example model selection for caption module in .process
modules={'caption': {'model':'blip-image-captioning-base',
                     'params': {}}}

Input File Size Limit

caption module input image files can currently be no larger than 5MB.

A Single-Module Pipeline for the caption Module

Please click here to visit the Pipeline Examples section of our documentation and review an example of a single-module pipeline for the caption module.

Further Information on caption Module IO and Clickability

Please click here to visit the Convenience Methods (and More!) documentation. There you will find two tools to learn more about the caption module: