How to safely re-"paste" a column after using table.pick.column operation ? #18

MariellaCC · 2023-11-23T10:30:07Z

Is there a recommended way to safely re-add a column to a table after using the table.pick.column operation?

Example of why this may be needed:
For some operations (e.g. the current version of the tokenize.texts_array module in the Kiara language_processing plugin), an array is requested as module input. Consequently, the table containing the texts needs to be de-assembled via a table.pick.column operation to get an array of texts, before using the tokenize.texts_array module.
At a later stage in the pipeline, there will be a need to display a preview of the processed array in the context of the original table. Should the assemble.tables operation be used to re-assemble the table? Does this operation ensure the preservation of the correct assembling of the initial table and the column, or is there an alternate way to proceed?

makkus · 2023-11-23T11:08:45Z

Good question. Short answer, there is the 'table.merge' module (kiara module explain table.merge).

Long answer: this is a bit more complicated than would seem. The 'table.merge' module is currently not used in any operations, because I haven't thought through all the implications, and I was waiting for some use-cases before I work on it properly. The main problem is that merging tables/arrays together does not have an obvious amount of inputs. For each table/array you want to include, you need one input field for the operation. But since we don't know the number of tables/arrays in advance, we can't hard-code that in the get_inputs_schema method. Which means no operation for now, just a module that you can configure on a case-by-case basis. I imagine we will end up with a few 'base' operations later on, which all use this module under the hood:

one input a table, one an array
two table inputs
two array inputs

From that, users can assemble any sort of tables by chaining the operations. But that is not ideal because we blow up the lineage with a number of steps, when really we would only have to have a single one. And except for some interactive use-case where we don't know in advance how many tables/arrays we have to deal with, we can just use the module directly (for example in declarative pipelines), so it's not really all that pressing for now.

Anyway, here's some example code that should outline how you would do it in Python code, happy to answer follow up quesions:

from kiara.api import KiaraAPI
from kiara.utils.cli import terminal_print
from kiara_plugin.tabular.models.table import KiaraTable

kiara = KiaraAPI.instance()

nodes_table = kiara.get_value("nodes")

pick_input = {
    "table": nodes_table,
    "column_name": "City"
}
pick_result = kiara.run_job("table.pick.column", pick_input)

# info for 'table.merge' module
merge_module_info = kiara.retrieve_module_type_info("table.merge")
print("The module info:")
terminal_print(merge_module_info)

join_to_table_op = {
    "module_type": "table.merge",
    "module_config": {
        "inputs_schema": {
            "orig_table": {
                "type": "table",
                "doc": "The table to add the column to."
            },
            "processed_column": {
                "type": "array",
                "doc": "The array to add as a column to the table."
            }
        }
    }
}

op = kiara.get_operation(join_to_table_op)
print("The info for the dynamically created operation:")
terminal_print(op)

join_inputs = {
    "orig_table": nodes_table,
    "processed_column": pick_result["array"]
}
join_result = kiara.run_job(operation=op, inputs=join_inputs)
joined_table: KiaraTable = join_result["table"].data
print("The resulting table:")
print(joined_table.to_pandas_dataframe())

(there is a 'column_map' config that lets you control how to name the added columns, but that gave me an exception so I'll need to look into it to fix)

makkus · 2023-11-23T11:11:27Z

(also: come to think of it, it would probably be useful to also let users choose the newly added column names directly, as an option, in addition to hard-configure it -- this is also a feature I'd still need to implement, and it might affect the overall design of the module)

MariellaCC · 2023-11-23T11:20:01Z

Thank you, I will try like that.

For such a case, do you think it's best to pass an array (versus a table) as an input, when the operation is performed on one column only of a table? Knowing that, often, the need in terms of analysis is to be able to see and compare things in their context (here the context is the table)?

makkus · 2023-11-23T11:33:24Z

Not sure, I think it depends on the context, and what you try to achieve. I'd imagine most of the patterns I thought about would be frontend-dependent. You could compare by displaying the old/new values side-by side as arrays, or in the same tables.

I haven't really given much thought on how to use any of this in an exploratory style like with jupyter, and the considerations would be quite different because UI frontends have very particular requirements, using kiara exploratory-style via code is probably pretty clumsy and annoying since you loose a lot of flexibility. So I reckon we'll have to get some experience and arrive at recommendations how to do patterns like this.

MariellaCC · 2023-11-23T12:03:56Z

Alright, I understand that it will also depend on frontend requirements,

so maybe @caro401 you may have/will have in the future insights to share about this specific question (column type versus table type inputs/outputs in modules)?

MariellaCC added the how-to Request or outline for a how-to or tutorial type doc label Nov 23, 2023

makkus mentioned this issue Apr 9, 2024

How-to create tests for module development purposes in plugin templates? #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to safely re-"paste" a column after using table.pick.column operation ? #18

How to safely re-"paste" a column after using table.pick.column operation ? #18

MariellaCC commented Nov 23, 2023

makkus commented Nov 23, 2023

makkus commented Nov 23, 2023

MariellaCC commented Nov 23, 2023 •

edited

Loading

makkus commented Nov 23, 2023

MariellaCC commented Nov 23, 2023 •

edited

Loading

How to safely re-"paste" a column after using table.pick.column operation ? #18

How to safely re-"paste" a column after using table.pick.column operation ? #18

Comments

MariellaCC commented Nov 23, 2023

makkus commented Nov 23, 2023

makkus commented Nov 23, 2023

MariellaCC commented Nov 23, 2023 • edited Loading

makkus commented Nov 23, 2023

MariellaCC commented Nov 23, 2023 • edited Loading

MariellaCC commented Nov 23, 2023 •

edited

Loading

MariellaCC commented Nov 23, 2023 •

edited

Loading