Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added example: meeting conversations extractor #1009

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
92 changes: 92 additions & 0 deletions examples/conversation_extraction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Meeting Conversation Extractor with Indexify

This project demonstrates how to build a meeting conversation extraction pipeline using Indexify. The pipeline processes audio files, transcribes them, classifies the content, and generates structured summaries based on the meeting type.

## Features

- Speech-to-text transcription using Faster Whisper
- Meeting type classification using Llama.cpp
- Structured summaries for different meeting types:
- Strategy meetings
- Sales/Marketing/Product calls
- R&D brainstorming sessions

## Prerequisites

- Python 3.9+
- Docker and Docker Compose (for containerized setup)

## Installation and Usage

### Option 1: Local Installation - In Process

1. Clone this repository:
```
git clone https://github.com/tensorlakeai/indexify
cd indexify/examples/conversation_extraction
```

2. Create a virtual environment and activate it:
```
python -m venv venv
source venv/bin/activate
```

3. Install the required dependencies:
```
pip install -r requirements.txt
```

4. Run the main script:
```
python main.py --mode in-process-run
```

### Option 2: Using Docker Compose - Deployed Graph

1. Clone this repository:
```
git clone https://github.com/tensorlakeai/indexify
cd indexify/examples/conversation_extraction
```

2. Ensure Docker and Docker Compose are installed on your system.

3. Build the Docker images for each function in the pipeline.

4. Start the services:
```
docker-compose up --build
```

5. Deploy the graph:
```
python main.py --mode remote-deploy
```

6. Run the workflow:
```
python main.py --mode remote-run
```

## Workflow

1. **Audio Processing:**
- Transcription: Converts speech to text using Faster Whisper, including speaker diarization
- Meeting Classification: Uses LLM to determine the type of meeting

2. **Content Analysis:**
Based on the meeting type classification, the system generates structured summaries:
- Strategy Meetings: Key decisions, action items, and strategic initiatives
- Sales/Marketing/Product Calls: Customer details, pain points, and next steps
- R&D Brainstorms: Innovative ideas, technical challenges, resource requirements, and potential impacts

## Graph Structure

The project uses the following Indexify graph:

```
transcribe_audio -> classify_meeting_intent -> router -> summarize_strategy_meeting
-> summarize_sales_call
-> summarize_rd_brainstorm
```
48 changes: 48 additions & 0 deletions examples/conversation_extraction/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
networks:
server:
services:
indexify:
image: tensorlake/indexify-server
ports:
- 8900:8900
networks:
server:
aliases:
- indexify-server
volumes:
- data:/app

audio-processor:
image: tensorlake/audio-processor:latest
command: ["indexify-cli", "executor", "--server-addr", "indexify:8900"]
networks:
server:
volumes:
- data:/app

transcriber:
image: tensorlake/transcriber:latest
command: ["indexify-cli", "executor", "--server-addr", "indexify:8900"]
networks:
server:
volumes:
- data:/app

router:
image: tensorlake/router:latest
command: ["indexify-cli", "executor", "--server-addr", "indexify:8900"]
networks:
server:
volumes:
- data:/app

llama-cpp:
image: tensorlake/llama-cpp:latest
command: ["indexify-cli", "executor", "--server-addr", "indexify:8900"]
networks:
server:
volumes:
- data:/app

volumes:
data:
Loading
Loading