image caption
The caption Module
🇨🇴 Versión en español de este documento
The caption module takes an image as input and returns a natural language text description of that image.
This overview of the caption module is divided into the following sections:
- Inputs and Outputs of the
captionModule - Available Models in the
captionModule - Model Parameters in the
captionModule - Input File Size Limit
- A Single-Module Pipeline for the
captionModule - Further Information on
captionModule IO and Clickability
Inputs and Outputs of the caption Module
The caption module accepts image inputs. Acceptable file formats are the following:
-
JPG
-
JPEG
-
PNG
The caption module returns a JSON file. At the core of the returned JSON file lies a dictionary that in turn contains the newly generated image caption.
Available Models in the caption Module
You can activate any of the following models when using the caption module:
-
vit-gpt2-image-captioning (default)
-
git-base [English]
Use the modules argument in the process method to determine what model you'd like active when you process files through the caption module.
Model Parameters in the caption Module
None of the caption module models are parameterizable. Consequently, when selecting what model you'll use through the process method's modules argument, params will always be set to an empty dictionary. For example:
# example model selection for caption module in .process
modules={'caption': {'model':'blip-image-captioning-base',
'params': {}}}
Input File Size Limit
caption module input image files can currently be no larger than 5MB.
A Single-Module Pipeline for the caption Module
Please click here to visit the Pipeline Examples section of our documentation and review an example of a single-module pipeline for the caption module.
Further Information on caption Module IO and Clickability
Please click here to visit the Convenience Methods (and More!) documentation. There you will find two tools to learn more about the caption module: