Skip to content

Latest commit

 

History

History
111 lines (73 loc) · 4.38 KB

README.md

File metadata and controls

111 lines (73 loc) · 4.38 KB

Reliable Agentic RAG with LLM Trustworthiness Estimates

This project is an exploration to understand RAG LLM agents and attempt to replicate the blog titled "Reliable Agentic RAG with LLM Trustworthiness Estimates" : https://pub.towardsai.net/reliable-agentic-rag-with-llm-trustworthiness-estimates-c488fb1bd116

Tested this on machine with following configuration

Python - 3.12
uv - 0.4.25
GPU - Nvidia GeForce RTX 3060 Mobile
OS - Ubuntu 22.04.5 LTS

Getting Started

Install uv.

Create a virtual environment using uv

uv venv --python 3.12
source .venv/bin/activate

Install the dependencies required to run this project inside the virtual environment.

uv sync

Download dataset

We will work with documentation of Nvidia on Triton Inference Server: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

As part of the script below, we will download all the links in html format using wget.

cd scripts
bash get_data.sh

Run data pipeline

cd ..

pwd
/home/username/reliable-agentic-rag

Note

Make sure you are running the following command from the root of the project (inside reliable-agentic-rag folder).

python agentic_rag/run.py data

Running this command creates a milvus.db file that acts as a knowledge base. To know more about data pipeline, refer to the documentation on Datapipeline.

Tip

The documentation for parameters that can be configuration as part of data pipeline here.

Run query pipeline

python agentic_rag/run.py query --query-text "How to make custom layers of TensorRT work in Triton?"

Warning

One (or two) API key(s) should be added to the .env file.

Trustworthy Language Model (TLM) by cleanlab.ai to estimate trustworthy score. Get API key from here: https://app.cleanlab.ai/account after creating an account.

API key for an LLM is required to be added in the .env file. If LLM is hosted locally, no API key is required. Configure only LLM_MODEL and LLM_API_BASE parameters.
If LLM is closed-source, API key is required to be added in the .env file.

For more information on how to configure various LLM providers, refer the documentation.

Tip

The documentation for parameters that can be configuration as part of query pipeline here.

Documentation

Data pipline

Documentation: docs

Query pipline

Documentation: docs

Agentic RAG

Strictly, the approach implemented as part of this project is not an agentic RAG approach. We are manually providing the list of retrieval strategies, calling the necessary functions for the corresponding strategy and selecting the next strategy depending on the trustworthiness score. To implement a truly autonomus agentic RAG approach, one approach outlined in this blog is to use JSON mode for structred responses or creating multiple agents to collaborate.

Some questions of my own

  • Will this fully autonomous agentic RAG approach outperform the current semi-automated RAG approach? What are advantages of using one over the other?
  • In current RAG approach, what are different approaches that can be used to replace the uncertainity estimator component?
  • Is this RAG approach reliable and robust to all scenarios?

Recommended Readings

Further work

  • Implement Contextual Retrieval by Anthropic. It will require changes to data pipeline logic.
  • Support for GraphRAG, Query Rewriting or Multi-Hop RAG as an additional retrieval strategy
  • Implement a agentic RAG processing loop (explained in Agentic RAG processing loop section of the link)
  • Make configuration more intutive (using pydantic)