Autogen for data analysis including numbers and text? #4726
-
I would like to use Autogen to analyse data stored in a CSV-file. When I set this up as a GroupChat, I am able to have, for example, a (1) DataLoader, (2) Coder and (3) CodeExecutor agent which are together able to write and execute a Python code which create a chart that show basic statistics of the data in the CSV-file. However, in the CSV-file are also text data in each row and I would like Autogen to summarise the text in each row briefly, or classify the text into topics using the LLM and then store the results back into a pandas dataframe and then work with the new content of the dataframe to create graphs. Unfortunately, I am not able to create an agent that is able to use the LLM for this task and apply it to filtered rows of the dataframe. Somehow, the coder agent mentioned above tries instead to install NLTK or Scikit-Learn and count words of the text instead of simply using the LLM to summarise the text. How can I achieve this? This example of using RAG seems to allow a specific file with data, but I would need the data to be preprocessed in interim steps by the other agents (as described above) and only the filtered Pandas dataframe should then be used by the RetrieveUserProxyAgent. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 12 replies
-
@analyticsinsights prompting for coding tasks can be tricky. unfortunately without an example of your code to look at I think its hard to advise you on next steps. |
Beta Was this translation helpful? Give feedback.
-
I just wanted to share with the community that answering questions on the content of a CSV-file is actually very straightforward --out of the box-- with LangChain tools.
The AG2 release to integrate tools from third-party providers shows how important tools are for agentic workflows. It would be great to have this also in Autogen. |
Beta Was this translation helpful? Give feedback.
Unless you want to convert the returned dataframe in text message, I would use a separate variable to store the result or saving the dataframe on disk.