From 9c8fb9617ef46739b171b92159f82cac3e35157e Mon Sep 17 00:00:00 2001
From: Harsh <65716674+Harsh-br0@users.noreply.github.com>
Date: Sun, 5 Jan 2025 20:53:32 +0530
Subject: [PATCH] Refine readme

Thanks to @aarya626 for some key suggestions
---
 README.md | 86 ++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 73 insertions(+), 13 deletions(-)
diff --git a/README.md b/README.md
index 39712c1..8f187d9 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,56 @@
-### Setup
-- Install Dependencies with `pip install -r requirements.txt`.
-- Rename `config.env.sample` to `config.env` and fill the vars.
-- There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index for `vector_store` collection with this config below.
+# Simple RAG Server Chatbot
 
+A sophisticated chatbot implementation using Retrieval-Augmented Generation (RAG) with FastAPI, MongoDB, and Hugging Face models. This chatbot can process multiple document types, maintain conversation history, and provide context-aware responses.
+
+## Features
+
+- **Document Processing**
+  - Support for multiple file uploads
+  - Automatic document type detection
+  - Chunked text processing with configurable size
+  - Vector embeddings for efficient retrieval
+
+- **Chat Capabilities**
+  - Context-aware conversations
+  - Session management
+  - History-aware retrieval system
+  - Concise, three-sentence maximum responses
+
+- **Technical Stack**
+  - FastAPI for the backend API
+  - MongoDB with vector search capabilities
+  - Hugging Face models for embeddings and chat
+  - LangChain for chain management
+
+## Models Used
+
+- **Embedding Model**: `sentence-transformers/all-mpnet-base-v2`
+  - Runs locally
+  - Used for generating document embeddings
+  - Downloads automatically to `models` directory
+
+- **LLM**: `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
+  - Accessed through Hugging Face Hub Inference API
+  - Used for generating responses
+
+## Setup Instructions
+
+1. **Install Dependencies**
+    ```bash
+    pip install -r requirements.txt
     ```
+
+2. **Configuration**
+    - Rename `config.env.sample` to `config.env`
+    - Fill in the required environment variables
+
+3. **MongoDB Setup**
+    > There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index.
+
+    - Create a vector search index manually through the Atlas console
+    - Use the following configuration for the `vector_store` collection: <br/> <br/>
+
+   ```json
     {
         "type": "vector",
         "path": "embedding",
@@ -12,18 +59,31 @@
     }
     ```
 
-### Usage
-- Run `python ./main.py`
-- Head over to `http://localhost:8080/docs`
+## Usage
+
+- Start the Server
+
+```bash
+python ./main.py
+```
+
+- Access the API by navigating to [http://localhost:8080/docs](http://localhost:8080/docs) for Swagger documentation
+
+## Configuration
 
-### Models Used
-- `sentence-transformers/all-mpnet-base-v2`
+Key configurations in `defaults.py`:
 
-    It is used for embedding vectors and this will run locally. Initially it will download the model into `models` directory.
+- `SPLITTER_CHUNK_SIZE`: 400
+- `SPLITTER_CHUNK_OVERLAP`: 25
+- `RETRIEVER_K_PARAM`: 4
+- `MAX_READ_LINES_FOR_TEXT_FILE`: 40
 
-- `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
+## Performance Note
 
-    It is the main LLM that being used through the HuggingFace Hub Inference API.
+The system may experience some latency due to:
 
+- Initial model download and loading
+- Document processing time
+- Inference API response time
 
-> Note: Since it is using Inference API and a model locally, the setup would be too slow. On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.
+> On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.