Skip to content

vector database

Open In Colab

The vector-db Module

🇨🇴 Versión en español de este documento

The vector-db module takes as input a NumPy array, indexes its vectors, and returns an indexed FAISS database.

This overview of the vector-db module is divided into the following sections:

Inputs and Outputs of the vector-db Module

The vector-db module accepts as input NPY files that consist of a single NumPy array. Each row is a vector to be indexed for vector search.

The vector-db module returns an indexed vector FAISS database file.

For an example of what a small sample input file might look like, see the output of the following code:

# examine contents of a small sample input file
import numpy as np

test_file = data_dir + "input/vectors.npy"
np.load(test_file)
array([[0, 1],
       [1, 0],
       [1, 1]], dtype=int64)

Available Models in the vector-db Module

You use the following model when using the vector-db module:

Use the modules argument in the process method to determine what model you'd like active when you process files through the vector-db module, though note that at this time there is only one option.

Model Parameters in the vector-db Module

The vector-db module model is not parameterizable. Consequently, should you wish to specify what model you'll use through the process method's modules argument, params will always be set to an empty dictionary. For example:

# example model selection for vector-db module in process
modules={'vector-db': {'model':'faiss',
                       'params': {}}}

Input File Size Limit

vector-db module input NPY files can currently be no larger than 3MB.

A Single-Module Pipeline for the vector-db Module and Local Querying

Please click here to visit the Pipeline Examples section of our documentation and review an example of a single-module pipeline for the vector-db module.

Keep in mind that the output of this pipeline will be an FAISS database file, which is not human-readable. Moreover, for this single-module pipeline to work, you'll need to separately have one or more properly formatted NPY files ready for input.

This example will also include an overview of how to locally query your output databases.

The semantic_search Method

Any pipeline containing a vector-db module preceded by a text-embedder module has access to the semantic_search method. This provides you with the convenient ability to effect semantic (a.k.a. vector) queries on the created vector database(s).

Further Information on vector-db Module IO and Clickability

Please click here to visit the Convenience Methods (and More!) documentation. There you will find two tools to learn more about the vector-db module: