DocuChat

DocuChat is a document interaction platform that allows users to upload PDF documents and query their content using advanced language models. Built with Python, FastAPI, LangChain, ChromaDB, and Ollama, DocuChat offers features for document summarization and question answering.

Key Features

Document Upload: Upload PDF documents for processing.
Content Interaction: Ask specific questions or request summaries based on the document content.
Advanced Retrieval: Uses a MultiQueryRetriever to generate multiple question perspectives for better document retrieval.
Local Processing: Ensures data privacy by processing everything locally.

Technologies Used

Python
FastAPI: Web framework for building the backend API.
Uvicorn: ASGI server for running FastAPI applications.
PyMuPDF: Library for PDF text extraction.
LangChain: Framework for chaining natural language processing operations.
ChromaDB: Vector database for managing document embeddings.
Ollama: Language models for generating embeddings and answering queries.

Setup Instructions

Prerequisites

Python 3.7+
Pip (Python package installer)
Ollama needs to be installed. Obtain it from Ollama's website.

Installation

Clone the Repository

git clone https://github.com/yourusername/docuchat.git
cd docuchat

Install Dependencies
```
pip install -r requirements.txt
```

Run the Application

uvicorn main:app --host 127.0.0.1 --port 8000

Access the Application Use Postman or similar tools to interact with the API endpoints.

API Endpoints

1. Upload PDF Document

Endpoint: /upload/
Method: POST
Description: Upload a PDF document to the server. The document is processed and stored for interaction.
Request:
- file (form-data): PDF file to upload

Response:

{
  "message": "PDF processed successfully. You can now ask questions."
}

2. Ask a Question

Endpoint: /ask/
Method: POST
Description: Submit a question related to the uploaded document. The application generates an answer based on the document's content.
Request:
- question (JSON body): The question you want to ask

Response:

{
  "answer": "The answer to your question."
}

Future Enhancements

Support for Different File Formats
- Add support for additional document formats like Word, Excel, and plain text.
Advanced User Interface
- Develop a web interface with drag-and-drop upload, real-time query results, and enhanced user experience.
Enhanced Retrieval Techniques
- Implement advanced retrieval techniques to improve answer accuracy and relevance.

Contributing

We welcome contributions to improve DocuChat! Please follow these steps to contribute:

Fork the repository.
Create a new branch for your changes.
Make your modifications and commit them.
Push your changes to your forked repository.
Open a pull request to the main repository.

Acknowledgments

The creators of LangChain, ChromaDB, and Ollama for providing essential libraries and tools for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
__pycache__		__pycache__
README.md		README.md
index.html		index.html
main.py		main.py
rag_pipeline.py		rag_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuChat

Key Features

Technologies Used

Setup Instructions

Prerequisites

Installation

API Endpoints

1. Upload PDF Document

2. Ask a Question

Future Enhancements

Contributing

Acknowledgments

About

Releases

Packages

Languages

Shivam-2310/DocuChat

Folders and files

Latest commit

History

Repository files navigation

DocuChat

Key Features

Technologies Used

Setup Instructions

Prerequisites

Installation

API Endpoints

1. Upload PDF Document

2. Ask a Question

Future Enhancements

Contributing

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages