-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to safely re-"paste" a column after using table.pick.column operation ? #18
Comments
Good question. Short answer, there is the 'table.merge' module ( Long answer: this is a bit more complicated than would seem. The 'table.merge' module is currently not used in any operations, because I haven't thought through all the implications, and I was waiting for some use-cases before I work on it properly. The main problem is that merging tables/arrays together does not have an obvious amount of inputs. For each table/array you want to include, you need one input field for the operation. But since we don't know the number of tables/arrays in advance, we can't hard-code that in the
From that, users can assemble any sort of tables by chaining the operations. But that is not ideal because we blow up the lineage with a number of steps, when really we would only have to have a single one. And except for some interactive use-case where we don't know in advance how many tables/arrays we have to deal with, we can just use the module directly (for example in declarative pipelines), so it's not really all that pressing for now. Anyway, here's some example code that should outline how you would do it in Python code, happy to answer follow up quesions: from kiara.api import KiaraAPI
from kiara.utils.cli import terminal_print
from kiara_plugin.tabular.models.table import KiaraTable
kiara = KiaraAPI.instance()
nodes_table = kiara.get_value("nodes")
pick_input = {
"table": nodes_table,
"column_name": "City"
}
pick_result = kiara.run_job("table.pick.column", pick_input)
# info for 'table.merge' module
merge_module_info = kiara.retrieve_module_type_info("table.merge")
print("The module info:")
terminal_print(merge_module_info)
join_to_table_op = {
"module_type": "table.merge",
"module_config": {
"inputs_schema": {
"orig_table": {
"type": "table",
"doc": "The table to add the column to."
},
"processed_column": {
"type": "array",
"doc": "The array to add as a column to the table."
}
}
}
}
op = kiara.get_operation(join_to_table_op)
print("The info for the dynamically created operation:")
terminal_print(op)
join_inputs = {
"orig_table": nodes_table,
"processed_column": pick_result["array"]
}
join_result = kiara.run_job(operation=op, inputs=join_inputs)
joined_table: KiaraTable = join_result["table"].data
print("The resulting table:")
print(joined_table.to_pandas_dataframe()) (there is a 'column_map' config that lets you control how to name the added columns, but that gave me an exception so I'll need to look into it to fix) |
(also: come to think of it, it would probably be useful to also let users choose the newly added column names directly, as an option, in addition to hard-configure it -- this is also a feature I'd still need to implement, and it might affect the overall design of the module) |
Thank you, I will try like that. For such a case, do you think it's best to pass an array (versus a table) as an input, when the operation is performed on one column only of a table? Knowing that, often, the need in terms of analysis is to be able to see and compare things in their context (here the context is the table)? |
Not sure, I think it depends on the context, and what you try to achieve. I'd imagine most of the patterns I thought about would be frontend-dependent. You could compare by displaying the old/new values side-by side as arrays, or in the same tables. I haven't really given much thought on how to use any of this in an exploratory style like with jupyter, and the considerations would be quite different because UI frontends have very particular requirements, using kiara exploratory-style via code is probably pretty clumsy and annoying since you loose a lot of flexibility. So I reckon we'll have to get some experience and arrive at recommendations how to do patterns like this. |
Alright, I understand that it will also depend on frontend requirements, so maybe @caro401 you may have/will have in the future insights to share about this specific question (column type versus table type inputs/outputs in modules)? |
@makkus
Is there a recommended way to safely re-add a column to a table after using the table.pick.column operation?
Example of why this may be needed:
For some operations (e.g. the current version of the tokenize.texts_array module in the Kiara language_processing plugin), an array is requested as module input. Consequently, the table containing the texts needs to be de-assembled via a table.pick.column operation to get an array of texts, before using the tokenize.texts_array module.
At a later stage in the pipeline, there will be a need to display a preview of the processed array in the context of the original table. Should the assemble.tables operation be used to re-assemble the table? Does this operation ensure the preservation of the correct assembling of the initial table and the column, or is there an alternate way to proceed?
The text was updated successfully, but these errors were encountered: