Skip to content

Commit

Permalink
Refine readme
Browse files Browse the repository at this point in the history
Thanks to @aarya626 for some key suggestions
  • Loading branch information
Harsh-br0 authored Jan 5, 2025
1 parent db8e355 commit 9c8fb96
Showing 1 changed file with 73 additions and 13 deletions.
86 changes: 73 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,56 @@
### Setup
- Install Dependencies with `pip install -r requirements.txt`.
- Rename `config.env.sample` to `config.env` and fill the vars.
- There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index for `vector_store` collection with this config below.
# Simple RAG Server Chatbot

A sophisticated chatbot implementation using Retrieval-Augmented Generation (RAG) with FastAPI, MongoDB, and Hugging Face models. This chatbot can process multiple document types, maintain conversation history, and provide context-aware responses.

## Features

- **Document Processing**
- Support for multiple file uploads
- Automatic document type detection
- Chunked text processing with configurable size
- Vector embeddings for efficient retrieval

- **Chat Capabilities**
- Context-aware conversations
- Session management
- History-aware retrieval system
- Concise, three-sentence maximum responses

- **Technical Stack**
- FastAPI for the backend API
- MongoDB with vector search capabilities
- Hugging Face models for embeddings and chat
- LangChain for chain management

## Models Used

- **Embedding Model**: `sentence-transformers/all-mpnet-base-v2`
- Runs locally
- Used for generating document embeddings
- Downloads automatically to `models` directory

- **LLM**: `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
- Accessed through Hugging Face Hub Inference API
- Used for generating responses

## Setup Instructions

1. **Install Dependencies**
```bash
pip install -r requirements.txt
```

2. **Configuration**
- Rename `config.env.sample` to `config.env`
- Fill in the required environment variables

3. **MongoDB Setup**
> There's an unexpected issue with mongodb (check [this](https://www.mongodb.com/community/forums/t/error-connecting-to-search-index-management-service/270272)) that wouldn't let us create index programmatically, so we need to create a vector search index manually through atlas console. Follow [this guide](https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/) to create the index.

- Create a vector search index manually through the Atlas console
- Use the following configuration for the `vector_store` collection: <br/> <br/>

```json
{
"type": "vector",
"path": "embedding",
Expand All @@ -12,18 +59,31 @@
}
```

### Usage
- Run `python ./main.py`
- Head over to `http://localhost:8080/docs`
## Usage

- Start the Server

```bash
python ./main.py
```

- Access the API by navigating to [http://localhost:8080/docs](http://localhost:8080/docs) for Swagger documentation

## Configuration

### Models Used
- `sentence-transformers/all-mpnet-base-v2`
Key configurations in `defaults.py`:

It is used for embedding vectors and this will run locally. Initially it will download the model into `models` directory.
- `SPLITTER_CHUNK_SIZE`: 400
- `SPLITTER_CHUNK_OVERLAP`: 25
- `RETRIEVER_K_PARAM`: 4
- `MAX_READ_LINES_FOR_TEXT_FILE`: 40

- `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
## Performance Note

It is the main LLM that being used through the HuggingFace Hub Inference API.
The system may experience some latency due to:

- Initial model download and loading
- Document processing time
- Inference API response time

> Note: Since it is using Inference API and a model locally, the setup would be too slow. On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.
> On my side, it took more than 2 mins exactly to add a document of 45+ pages to vector store and almost 1 min to process the messages with LLM.

0 comments on commit 9c8fb96

Please sign in to comment.