From 9c8fb9617ef46739b171b92159f82cac3e35157e Mon Sep 17 00:00:00 2001 From: Harsh <65716674+Harsh-br0@users.noreply.github.com> Date: Sun, 5 Jan 2025 20:53:32 +0530 Subject: [PATCH] Refine readme Thanks to @aarya626 for some key suggestions --- README.md | 86 ++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 73 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 39712c1..8f187d9 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,56 @@ -### Setup -- Install Dependencies with `pip install -r requirements.txt`. -- Rename `config.env.sample` to `config.env` and fill the vars. -- There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index for `vector_store` collection with this config below. +# Simple RAG Server Chatbot +A sophisticated chatbot implementation using Retrieval-Augmented Generation (RAG) with FastAPI, MongoDB, and Hugging Face models. This chatbot can process multiple document types, maintain conversation history, and provide context-aware responses. + +## Features + +- **Document Processing** + - Support for multiple file uploads + - Automatic document type detection + - Chunked text processing with configurable size + - Vector embeddings for efficient retrieval + +- **Chat Capabilities** + - Context-aware conversations + - Session management + - History-aware retrieval system + - Concise, three-sentence maximum responses + +- **Technical Stack** + - FastAPI for the backend API + - MongoDB with vector search capabilities + - Hugging Face models for embeddings and chat + - LangChain for chain management + +## Models Used + +- **Embedding Model**: `sentence-transformers/all-mpnet-base-v2` + - Runs locally + - Used for generating document embeddings + - Downloads automatically to `models` directory + +- **LLM**: `TinyLlama/TinyLlama-1.1B-Chat-v1.0` + - Accessed through Hugging Face Hub Inference API + - Used for generating responses + +## Setup Instructions + +1. **Install Dependencies** + ```bash + pip install -r requirements.txt ``` + +2. **Configuration** + - Rename `config.env.sample` to `config.env` + - Fill in the required environment variables + +3. **MongoDB Setup** + > There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index. + + - Create a vector search index manually through the Atlas console + - Use the following configuration for the `vector_store` collection:

+ + ```json { "type": "vector", "path": "embedding", @@ -12,18 +59,31 @@ } ``` -### Usage -- Run `python ./main.py` -- Head over to `http://localhost:8080/docs` +## Usage + +- Start the Server + +```bash +python ./main.py +``` + +- Access the API by navigating to [http://localhost:8080/docs](http://localhost:8080/docs) for Swagger documentation + +## Configuration -### Models Used -- `sentence-transformers/all-mpnet-base-v2` +Key configurations in `defaults.py`: - It is used for embedding vectors and this will run locally. Initially it will download the model into `models` directory. +- `SPLITTER_CHUNK_SIZE`: 400 +- `SPLITTER_CHUNK_OVERLAP`: 25 +- `RETRIEVER_K_PARAM`: 4 +- `MAX_READ_LINES_FOR_TEXT_FILE`: 40 -- `TinyLlama/TinyLlama-1.1B-Chat-v1.0` +## Performance Note - It is the main LLM that being used through the HuggingFace Hub Inference API. +The system may experience some latency due to: +- Initial model download and loading +- Document processing time +- Inference API response time -> Note: Since it is using Inference API and a model locally, the setup would be too slow. On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM. +> On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.