keyword database
The keyword-db Module
🇨🇴 Versión en español de este documento
The keyword-db module takes document input, parses the document for non-trivial keywords, identifies each of their lemmatized stems, and returns an SQLite database with this content.
This overview of the keyword-db module is divided into the following sections:
- Inputs and Outputs of the
keyword-dbModule - Available Models in the
keyword-dbModule - Model Parameters in the
keyword-dbModule - Input File Size Limit
- A Single-Module Pipeline for the
keyword-dbModule and Local Querying - The
keyword_searchMethod - Further Information on
keyword-dbModule IO and Clickability
Inputs and Outputs of the keyword-db Module
The keyword-db module accepts textual document inputs. Acceptable file formats are the following:
-
TXT
-
PDF (automatically converted to TXT before processing)
-
DOCX (automatically converted to TXT before processing)
-
PPTX (automatically converted to TXT before processing)
The keyword-db module returns an SQLite database containing every non-trivial keyword in the document and its lemmatized stem.
Available Models in the keyword-db Module
You use the following model when using the keyword-db module:
base- (default) Krixik-made
Use the modules argument in the process method to determine what model you'd like active when you process files through the keyword-db module, though note that at this time there is only one option.
Model Parameters in the keyword-db Module
The keyword-db module model is not parameterizable. Consequently, should you wish to specify what model you'll use through the process method's modules argument, params will always be set to an empty dictionary. For example:
# example model selection for keyword-db module in process
modules={'keyword-db': {'model':'base',
'params': {}}}
Input File Size Limit
keyword-db module input TXT files can currently be no larger than 2MB.
DOCX, PDF, and PPTX input files can currently be no larger than 100MB. Once they are converted to TXT, the resultant TXT file will then be held to the aforementioned 2MB limit.
A Single-Module Pipeline for the keyword-db Module and Local Querying
Please click here to visit the Pipeline Examples section of our documentation and review an example of a single-module pipeline for the keyword-db module.
Keep in mind that the output of this pipeline will be an SQLite database file, which is not human-readable.
This example will also include an overview of how to locally query your output databases.
The keyword_search Method
Any pipeline containing a keyword-db module has access to the keyword_search method. This provides you with the convenient ability to effect keyword queries on the created keyword database(s).
Further Information on keyword-db Module IO and Clickability
Please click here to visit the Convenience Methods (and More!) documentation. There you will find two tools to learn more about the keyword-db module: