overview
Krixik Search Pipelines
🇨🇴 Versión en español de este documento
Search Pipeline Overview
Search pipelines are those that enable document search on textual documents. These documents may be the initial pipeline input or they may be generated mid-pipeline, as may for instance be the case in pipelines that begin with transcription (where audio is converted to text) or image captioning (image to text) modules.
Such search capabilities are often employed in RAG (Retrieval-Augmented Generation) systems today, but the pipelines described in this section can also have general application with recommendation systems, image and video retrieval based on content similarity, and personalized content delivery, to name a few possibilitiess.
Two types of document search can be enabled: semantic search and keyword search. Depending on which of these is sought, the final module of the pipeline must respectively be vector-db or keyword-db.
Search pipelines are more complex than other pipelines because they require an additional step.
-
Files must first be "loaded" into the the pipeline with the
processmethod. -
The
keyword_searchmethod or thesemantic_searchmethod can be invoked on a search pipeline once at least one file has been processed through it. Keep in mind that thekeyword_searchmethod can only be invoked on a pipeline that ends withkeyword-db, and thesemantic_searchmethod can only be invoked on a pipeline that ends withvector-db.
Search Pipeline Examples
-
Semantic Search: Enables
semantic searchon an input text file. -
Semantic Search on Snippets: Enables
semantic searchon snippets in an input JSON file. -
Keyword Search: Enables
keyword searchon an input text file. -
Semantically-Searchable Transcription:
Transcribesan input audio file and then enablessemantic searchon the transcript. -
Keyword-Searchable Transcription:
Transcribesan input audio file and then enableskeyword searchon the transcript. -
Semantically-Searchable Translation:
Translatesan input text file and then enablessemantic searchon the translation. -
Semantically-Searchable Translated Transcription:
Transcribesan input audio file,translatesit into English, and then enablessemantic searchon the translation. -
Semantically-Searchable OCR:
Extracts textfrom an input image and then enablessemantic searchon the extracted text. -
Keyword-Searchable Image Captions: Generates a
textual captionfor an input image and then enableskeyword searchon the caption.