Skip to content

the show_tree method

Open In Colab

The show_tree Method

🇨🇴 Versión en español de este documento

The show_tree method allows you to visualize—at your terminal or IDE output—all files currently in your pipeline. It is designed as a simple analog to the standard UNIX tree command.

This overview of the show_tree method is divided into the following sections:

show_tree Method Arguments

The show_tree method takes a single (required) argument:

  • symbolic_directory_path (str) - The symbolic_directory_path whose contents you wish to display. If the wildcard operator is leveraged, this symbolic_directory_path will become the root(s) of the outputted tree, and all files at the root(s) or beneath will be displayed.

show_tree Method Example

For this document's example we will use a pipeline consisting of a single parser module. We use the create_pipeline method to instantiate the pipeline, and then process a few files through it. Note the symbolic_directory_path structure we create:

# create an example pipeline with a single module
pipeline = krixik.create_pipeline(name="show_tree_method_1_parser", module_chain=["parser"])

# define path to an input file from examples directory
test_file = data_dir + "input/1984_very_short.txt"

# process short input file with various metdata
process_output = pipeline.process(
    local_file_path=test_file,
    local_save_directory=data_dir + "output",  # save output repo data output subdir
    expire_time=60 * 30,  # set all process data to expire in 30 minutes
    wait_for_process=True,  # wait for process to complete before regaining ide
    verbose=False,
    symbolic_directory_path="/my/custom/path",
    file_name="file_num_one.txt",
)

process_output = pipeline.process(
    local_file_path=test_file,
    local_save_directory=data_dir + "output",  # save output repo data output subdir
    expire_time=60 * 30,  # set all process data to expire in 30 minutes
    wait_for_process=True,  # wait for process to complete before regaining ide
    verbose=False,
    symbolic_directory_path="/my/custom/path",
    file_name="file_num_two.txt",
)

process_output = pipeline.process(
    local_file_path=test_file,
    local_save_directory=data_dir + "output",  # save output repo data output subdir
    expire_time=60 * 30,  # set all process data to expire in 30 minutes
    wait_for_process=True,  # wait for process to complete before regaining ide
    verbose=False,
    symbolic_directory_path="/my/custom/path/subpath",
    file_name="file_num_three.txt",
)

Now you can visualize your pipeline's symbolic directory structure by using show_tree.

This example will leverage the "global root" wildcard symbolic_directory_path, which will be explained momentarily.

# show the directory structure of a pipeline
show_tree_output = pipeline.show_tree(symbolic_directory_path="/*")
/
└── /my
    └── /custom
        └── /path
            ├── file_num_one.txt
            ├── file_num_two.txt
            └── /subpath
                └── file_num_three.txt

Note that directory names are preceded by a forward slash (/) character and file names are not. This allows you to easily differentiate between them.

The Wildcard Operator and the Global Root

The wildcard operator is the asterisk: *

As in the list method, the semantic_search method and the keyword_search method you can use the wildcard operator * in the symbolic_directory_path argument for the show_tree method.

The wildcard operator * can be used as a suffix in the show_tree method if you wish to show the tree structure beneath a certain directory. Syntax might look like this:

# symbolic_directory_path use of wildcard operator *
symbolic_directory_path='/home/files/studies*'

Using this symbolic_directory_path in show_tree would generate a visualization of the directory structure under /home/files/studies.

The maximum expression of using the wildcard operator in a symbolic_directory_path is what we call "the global root". It's simply a forward slash and a wildcard operator *, includes every single file in your pipeline, and looks like this:

# example of the global root
symbolic_directory_path='/*'

As seen in the above code output, using the global root with the show_tree method returns a visualization of your entire pipeline's directory structure.