Audio-To-Text Transcription Tool

This tool allows you to transcribe audio files to text using either the free Google Speech Recognition service or the paid Google Cloud Speech-to-Text API. It also offers post-processing capabilities using Claude Opus 3 for enhanced formatting and error correction.

Features

Support for multiple audio formats: MP3, WAV, M4A, AAC, FLAC, OGG, WMA, AIFF, AIF, AMR
Free version for transcribing short audio clips
Paid version for transcribing longer audio files with improved accuracy (requires Google Cloud API key)
Efficient processing of long audio files using chunking and parallel processing
User-friendly command-line interface with interactive prompts
Automatic audio format conversion when necessary
Display of word count, audio duration, and confidence score for transcriptions (where applicable)
Option to save transcriptions as text or markdown files
Intelligent file naming to avoid overwriting existing files
Real-time progress reporting for long audio files
Post-processing option using Claude Opus 3 for improved readability and error correction
Support for custom vocabularies in the paid version to enhance transcription accuracy
Post-processing option using Claude Opus 3 for improved readability and error correction

Notes

The free version is limited to short audio clips (typically 10-15 seconds) due to API constraints.
The paid version supports longer audio files with improved accuracy, using the latest Google Cloud Speech-to-Text API (v1p1beta1).
Long audio files are automatically split into chunks and processed in parallel for faster transcription.
Temporary files created during processing are automatically cleaned up.
When saving transcriptions, the tool checks for existing files and increments the filename (e.g., filename-01, filename-02) to avoid overwriting.
Claude post-processing is performed in chunks for long transcriptions to stay within API limits.

Examples

~/dev-tests-examples/ contains short and long sample audio files, and short and long transcription output examples with and withou Claude Opus 3 processing. The AI voice is from: https://www.hume.ai.

AI-Assisted Development Insights

The project was developed with Claude.ai and Claude Engineer CLI. The ~/ai-insights/ directory contains valuable information about this AI-assisted development approach:

cli-dump

Complete transcripts of Claude Engineering CLI sessions.

dev-summaries

Comprehensive project progress updates that allow any AI or developer to quickly understand the current state of the project. This serves as a continuity mechanism to preserve context beyond various model and system limitations, including token limits, context windows, session boundaries, rate limits, stateless API interactions, parsing complexities, and cross-model compatibility issues.

error-logs

CLI outputs for each encountered bug. These are submitted to Claude via Markdown files rather than direct paste to prevent formatting issues caused by multiple carriage returns. This method helps maintain Claude Engineer's stability when processing large blocks of text.

PRDs

Product requirement documents generated by both OpenAI and Anthropic, outlining the specifications for this project or PRDs we for features we created as we developed the software.

prompts

Prompts designed to efficiently bring Claude Engineering CLI back up to speed after any connection or context loss, ensuring continuity in the development process.

steps

A continuously updated file containing Claude's recommendations for next steps in the project, providing a clear roadmap for development.

Requirements

Python 3.12
FFmpeg (for audio format conversion)
Google Cloud account and credentials (for paid version)
Anthropic API key (for Claude post-processing)

Installation

Clone this repository or download the project files.
Create and activate and venv

Install the required Python libraries:

pip install -r requirements.txt

or

pip install --no-cache-dir --upgrade -r requirements.txt

Install FFmpeg:
- On macOS (using Homebrew): brew install ffmpeg
- On Ubuntu or Debian: sudo apt-get install ffmpeg
- On Windows, download the FFmpeg binaries from the official website and add them to your system PATH.
For the paid version, follow the setup instructions in the google-cloud-api-setup.md file.
For Claude post-processing, create a .env file (see: .env-example) in the project root and add your Anthropic API key:
```
CLAUDE_API_KEY=your_api_key_here
```

Usage

Run the main script from the command line:

python3 main.py

Follow the interactive prompts to:

Choose between the free version (for short audio clips) and the paid version (for longer audio files).
Provide the path to your audio file.
Save the transcription (optional) and choose the output format (text or markdown).
Optionally post-process the transcription using Claude Opus 3.

The script will:

Convert the audio file to the required format if necessary.
For longer files (paid version), split the audio into chunks for efficient processing.
Transcribe the audio using the selected service, with parallel processing for longer files.
Display the transcription, word count, audio duration, and confidence score (where applicable).
Show real-time progress for longer files being processed in chunks.
Optionally save the transcription to a file in the specified directory.
If chosen, post-process the transcription using Claude Opus 3 for improved formatting and error correction.

Troubleshooting

Ensure FFmpeg is correctly installed and accessible in your system PATH.
Verify that you have an active internet connection, as both versions use online speech recognition services.
For the paid version, make sure your Google Cloud credentials are correctly set up as described in the google-cloud-api-setup.md file.
For Claude post-processing, ensure your Anthropic API key is correctly set in the .env file.
If you encounter "Invalid audio file" errors, check that your audio file is not corrupted and is in a supported format.
For very large audio files, ensure you have sufficient disk space for temporary files and chunked processing.
If you experience API quota issues with the paid version or Claude post-processing, check your respective API consoles for any limits or billing issues.

If you encounter any other issues or have questions, please open an issue in this repository.

Version History

v1.0.0
- Initial release

Future Improvements

Implement batch processing for multiple audio files
Develop a web-based user interface for easier use
Add support for more languages and dialects
Implement a caching system to avoid reprocessing of previously transcribed audio
Explore options for improving transcription accuracy through custom language models
Develop a comprehensive logging system for better debugging and performance analysis
Enhance the Claude post-processing feature with more advanced formatting options
Implement a progress bar for long-running processes
Develop a comprehensive logging system for better debugging and performance analysis
Enhance the Claude post-processing feature with more advanced formatting options
Implement a progress bar for long-running processes

We welcome contributions and suggestions to improve this tool.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-To-Text Transcription Tool

Features

Notes

Examples

AI-Assisted Development Insights

cli-dump

dev-summaries

error-logs

PRDs

prompts

steps

Requirements

Installation

Usage

Troubleshooting

Version History

Future Improvements

License

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
ai-insights		ai-insights
dev-tests-examples		dev-tests-examples
src		src
.env-example		.env-example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
google-cloud-api-setup.md		google-cloud-api-setup.md
main.py		main.py
requirements.txt		requirements.txt

License

parkertoddbrooks/Audio-To-Text-Transcription

Folders and files

Latest commit

History

Repository files navigation

Audio-To-Text Transcription Tool

Features

Notes

Examples

AI-Assisted Development Insights

cli-dump

dev-summaries

error-logs

PRDs

prompts

steps

Requirements

Installation

Usage

Troubleshooting

Version History

Future Improvements

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages