From 9c8fb9617ef46739b171b92159f82cac3e35157e Mon Sep 17 00:00:00 2001
From: Harsh <65716674+Harsh-br0@users.noreply.github.com>
Date: Sun, 5 Jan 2025 20:53:32 +0530
Subject: [PATCH] Refine readme
Thanks to @aarya626 for some key suggestions
---
README.md | 86 ++++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 73 insertions(+), 13 deletions(-)
diff --git a/README.md b/README.md
index 39712c1..8f187d9 100644
--- a/README.md
+++ b/README.md
@@ -1,9 +1,56 @@
-### Setup
-- Install Dependencies with `pip install -r requirements.txt`.
-- Rename `config.env.sample` to `config.env` and fill the vars.
-- There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index for `vector_store` collection with this config below.
+# Simple RAG Server Chatbot
+A sophisticated chatbot implementation using Retrieval-Augmented Generation (RAG) with FastAPI, MongoDB, and Hugging Face models. This chatbot can process multiple document types, maintain conversation history, and provide context-aware responses.
+
+## Features
+
+- **Document Processing**
+ - Support for multiple file uploads
+ - Automatic document type detection
+ - Chunked text processing with configurable size
+ - Vector embeddings for efficient retrieval
+
+- **Chat Capabilities**
+ - Context-aware conversations
+ - Session management
+ - History-aware retrieval system
+ - Concise, three-sentence maximum responses
+
+- **Technical Stack**
+ - FastAPI for the backend API
+ - MongoDB with vector search capabilities
+ - Hugging Face models for embeddings and chat
+ - LangChain for chain management
+
+## Models Used
+
+- **Embedding Model**: `sentence-transformers/all-mpnet-base-v2`
+ - Runs locally
+ - Used for generating document embeddings
+ - Downloads automatically to `models` directory
+
+- **LLM**: `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
+ - Accessed through Hugging Face Hub Inference API
+ - Used for generating responses
+
+## Setup Instructions
+
+1. **Install Dependencies**
+ ```bash
+ pip install -r requirements.txt
```
+
+2. **Configuration**
+ - Rename `config.env.sample` to `config.env`
+ - Fill in the required environment variables
+
+3. **MongoDB Setup**
+ > There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index.
+
+ - Create a vector search index manually through the Atlas console
+ - Use the following configuration for the `vector_store` collection:
+
+ ```json
{
"type": "vector",
"path": "embedding",
@@ -12,18 +59,31 @@
}
```
-### Usage
-- Run `python ./main.py`
-- Head over to `http://localhost:8080/docs`
+## Usage
+
+- Start the Server
+
+```bash
+python ./main.py
+```
+
+- Access the API by navigating to [http://localhost:8080/docs](http://localhost:8080/docs) for Swagger documentation
+
+## Configuration
-### Models Used
-- `sentence-transformers/all-mpnet-base-v2`
+Key configurations in `defaults.py`:
- It is used for embedding vectors and this will run locally. Initially it will download the model into `models` directory.
+- `SPLITTER_CHUNK_SIZE`: 400
+- `SPLITTER_CHUNK_OVERLAP`: 25
+- `RETRIEVER_K_PARAM`: 4
+- `MAX_READ_LINES_FOR_TEXT_FILE`: 40
-- `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
+## Performance Note
- It is the main LLM that being used through the HuggingFace Hub Inference API.
+The system may experience some latency due to:
+- Initial model download and loading
+- Document processing time
+- Inference API response time
-> Note: Since it is using Inference API and a model locally, the setup would be too slow. On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.
+> On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.