save and load a pipeline
Saving and Loading Pipelines
🇨🇴 Versión en español de este documento
This overview of the saving and loading pipelines is divided into the following sections:
The save_pipeline Method
Saving your pipeline in Krixik means saving its configuration to disk.
You can save the configuration of a pipeline by using the save_pipeline method. This method takes one (required) argument:
config_path: A valid local file path.
config_path must end with a .yml or .yaml extension. This is currently the only file format that Krixik saves pipelines into.
To demonstrate how it works, first you'll need to create a pipeline with the create_pipeline method:
# first create a pipeline
pipeline = krixik.create_pipeline(
name="saving_and_loading_pipelines_1_summarize_summarize_keyword-db", module_chain=["summarize", "summarize", "keyword-db"]
)
Now that you have a pipeline you can use the save_pipeline method to save that pipeline to disk:
# save a pipeline's configuration to disk - example file path provided
pipeline.save_pipeline(config_path=data_dir + "pipeline_configs/save-pipeline-demo.yaml")
For your convenience, if a file by the given filename does not exist at the given location, Krixik will locally create the file and then save your pipeline into it.
The load_pipeline Method
Given that a pipeline's configuration is its fundamental descriptor, any valid config file can be loaded into Krixik, thus reinstantiating its associated pipeline.
The load_pipeline method takes a single (required) argument:
config_path: A valid local file path.
For the load_pipeline method to work, the file indicated by config_path must (a) exist, (b) have a .yaml or .yml extension, and (c) hold a properly formatted Krixik pipeline configuration. If one of these is not true, the method will fail. If you've earlier saved a Krixik pipeline to that destination with that file name, it should work just fine.
Using the load_pipeline method looks like this:
# load a pipeline into memory via its valid configuration file
my_pipeline = krixik.load_pipeline(config_path=data_dir + "pipeline_configs/save-pipeline-demo.yaml")
Note that you don't need to have previously dealt with the saved pipeline yourself. For instance, a colleague may have shared a pipeline configuration file with you, or you may have written the file from scratch. As long as the config is properly formatted, the load_pipeline method will work as it should.
The reset_pipeline Method
The load_pipeline method discussed above reinstantiates a previously existing pipeline with the same name and module_chain. Since files processed through a pipeline are attached to the pipeline's name, those files would continue to be attached to this newly instantiated pipeline.
If you wish to recreate a pipeline but seek to do so with a blank slate, the easiest way to do it is with the reset_pipeline method, which deletes all processed datapoints attached to that pipeline (i.e. anything relating to any files previously processed through it).
The reset_pipeline method takes one argument (required):
pipeline: The Python variable that the pipeline object is currently saved to.
Note that this is not the name of the pipeline. For instance, if you wished to reset the pipeline in the load_pipeline method example code immediately above, the pipeline argument for the reset_pipeline method would be set to my_pipeline_2, as follows:
# delete all processed datapoints belonging to this pipeline
krixik.reset_pipeline(my_pipeline)
In other words, the pipeline argument to the reset_pipeline method is a Python variable that a pipeline object has been assigned to, and reset_pipeline will delete any datapoints associated with that pipeline object's name on the Krixik system.