vector database
The vector-db Module
🇨🇴 Versión en español de este documento
The vector-db module takes as input a NumPy array, indexes its vectors, and returns an indexed FAISS database.
This overview of the vector-db module is divided into the following sections:
- Inputs and Outputs of the
vector-dbModule - Available Models in the
vector-dbModule - Model Parameters in the
vector-dbModule - Input File Size Limit
- A Single-Module Pipeline for the
vector-dbModule and Local Querying - The
semantic_searchMethod - Further Information on
vector-dbModule IO and Clickability
Inputs and Outputs of the vector-db Module
The vector-db module accepts as input NPY files that consist of a single NumPy array. Each row is a vector to be indexed for vector search.
The vector-db module returns an indexed vector FAISS database file.
For an example of what a small sample input file might look like, see the output of the following code:
# examine contents of a small sample input file
import numpy as np
test_file = data_dir + "input/vectors.npy"
np.load(test_file)
array([[0, 1],
[1, 0],
[1, 1]], dtype=int64)
Available Models in the vector-db Module
You use the following model when using the vector-db module:
- faiss (default)
Use the modules argument in the process method to determine what model you'd like active when you process files through the vector-db module, though note that at this time there is only one option.
Model Parameters in the vector-db Module
The vector-db module model is not parameterizable. Consequently, should you wish to specify what model you'll use through the process method's modules argument, params will always be set to an empty dictionary. For example:
# example model selection for vector-db module in process
modules={'vector-db': {'model':'faiss',
'params': {}}}
Input File Size Limit
vector-db module input NPY files can currently be no larger than 3MB.
A Single-Module Pipeline for the vector-db Module and Local Querying
Please click here to visit the Pipeline Examples section of our documentation and review an example of a single-module pipeline for the vector-db module.
Keep in mind that the output of this pipeline will be an FAISS database file, which is not human-readable. Moreover, for this single-module pipeline to work, you'll need to separately have one or more properly formatted NPY files ready for input.
This example will also include an overview of how to locally query your output databases.
The semantic_search Method
Any pipeline containing a vector-db module preceded by a text-embedder module has access to the semantic_search method. This provides you with the convenient ability to effect semantic (a.k.a. vector) queries on the created vector database(s).
Further Information on vector-db Module IO and Clickability
Please click here to visit the Convenience Methods (and More!) documentation. There you will find two tools to learn more about the vector-db module: