rename to speaches

speaches-ai · Jan 12, 2025 · 43cc67a · 43cc67a
1 parent 9922993
commit 43cc67a
Show file tree

Hide file tree

Showing 45 changed files with 243 additions and 239 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -1,7 +1,7 @@
 ARG BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04
 # hadolint ignore=DL3006
 FROM ${BASE_IMAGE}
-LABEL org.opencontainers.image.source="https://github.com/fedirz/faster-whisper-server"
+LABEL org.opencontainers.image.source="https://github.com/speaches-ai/speaches"
 LABEL org.opencontainers.image.licenses="MIT"
 # `ffmpeg` is installed because without it `gradio` won't work with mp3(possible others as well) files
 # hadolint ignore=DL3008
@@ -15,7 +15,7 @@ RUN apt-get update && \
 USER ubuntu
 ENV HOME=/home/ubuntu \
     PATH=/home/ubuntu/.local/bin:$PATH
-WORKDIR $HOME/faster-whisper-server
+WORKDIR $HOME/speaches
 # https://docs.astral.sh/uv/guides/integration/docker/#installing-uv
 COPY --chown=ubuntu --from=ghcr.io/astral-sh/uv:0.5.14 /uv /bin/uv
 # https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers
@@ -35,7 +35,7 @@ RUN mkdir -p $HOME/.cache/huggingface/hub
 ENV WHISPER__MODEL=Systran/faster-whisper-large-v3
 ENV UVICORN_HOST=0.0.0.0
 ENV UVICORN_PORT=8000
-ENV PATH="$HOME/faster-whisper-server/.venv/bin:$PATH"
+ENV PATH="$HOME/speaches/.venv/bin:$PATH"
 # https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhubenablehftransfer
 # NOTE: I've disabled this because it doesn't inside of Docker container. I couldn't pinpoint the exact reason. This doesn't happen when running the server locally.
 # RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
@@ -44,4 +44,4 @@ ENV HF_HUB_ENABLE_HF_TRANSFER=0
 # https://www.reddit.com/r/StableDiffusion/comments/1f6asvd/gradio_sends_ip_address_telemetry_by_default/
 ENV DO_NOT_TRACK=1
 EXPOSE 8000
-CMD ["uvicorn", "--factory", "faster_whisper_server.main:create_app"]
+CMD ["uvicorn", "--factory", "speaches.main:create_app"]
diff --git a/README.md b/README.md
@@ -1,11 +1,15 @@
-# Faster Whisper Server
+> [!NOTE]
+> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just transcription.
+
+# Speaches
+
+`speaches` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
 
-`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
 Features:
 
 - GPU and CPU support.
 - Easily deployable using Docker.
-- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
+- **Configurable through environment variables (see [config.py](./src/speaches/config.py))**.
 - OpenAI API compatible.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
@@ -18,7 +22,7 @@ Please create an issue if you find a bug, have a question, or a feature suggesti
 See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
 
 - Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
-  - Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
+  - Unlike OpenAI's API, `speaches` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
 - Audio file translation via `POST /v1/audio/translations` endpoint.
 - Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
   - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
@@ -35,23 +39,23 @@ See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio)
 NOTE: I'm using newer Docker Compsose features. If you are using an older version of Docker Compose, you may need need to update.
 
 ```bash
-curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
+curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
 
 # for GPU support
-curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
+curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
 docker compose --file compose.cuda.yaml up --detach
 # for CPU only (use this if you don't have a GPU, as the image is much smaller)
-curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cpu.yaml
+curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
 docker compose --file compose.cpu.yaml up --detach
 ```
 
 ### Using Docker
 
 ```bash
 # for GPU support
-docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach fedirz/faster-whisper-server:latest-cuda
+docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach ghcr.io/speaches-ai/speaches:latest-cuda
 # for CPU only (use this if you don't have a GPU, as the image is much smaller)
-docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu
+docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach ghcr.io/speaches-ai/speaches:latest-cpu
 ```
 
 ### Using Kubernetes

diff --git a/Taskfile.yaml b/Taskfile.yaml
@@ -2,8 +2,8 @@ version: "3"
 tasks:
   server:
     cmds:
-      - pkill --signal SIGKILL --echo --full 'uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app' || true
-      - opentelemetry-instrument uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app {{.CLI_ARGS}}
+      - pkill --signal SIGKILL --echo --full 'uvicorn --factory --host 0.0.0.0 speaches.main:create_app' || true
+      - opentelemetry-instrument uvicorn --factory --host 0.0.0.0 speaches.main:create_app {{.CLI_ARGS}}
     sources:
       - src/**/*.py
   test:

diff --git a/compose.cpu.yaml b/compose.cpu.yaml
@@ -1,11 +1,11 @@
 # include:
 #   - compose.observability.yaml
 services:
-  faster-whisper-server:
+  speaches:
     extends:
       file: compose.yaml
-      service: faster-whisper-server
-    image: fedirz/faster-whisper-server:latest-cpu
+      service: speaches
+    image: ghcr.io/speaches-ai/speaches:latest-cpu
     build:
       args:
         BASE_IMAGE: ubuntu:24.04

diff --git a/compose.cuda-cdi.yaml b/compose.cuda-cdi.yaml
@@ -4,10 +4,10 @@
 # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
 # https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices
 services:
-  faster-whisper-server:
+  speaches:
     extends:
       file: compose.cuda.yaml
-      service: faster-whisper-server
+      service: speaches
     volumes:
       - hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
     deploy:

diff --git a/compose.cuda.yaml b/compose.cuda.yaml
@@ -1,11 +1,11 @@
 # include:
 #   - compose.observability.yaml
 services:
-  faster-whisper-server:
+  speaches:
     extends:
       file: compose.yaml
-      service: faster-whisper-server
-    image: fedirz/faster-whisper-server:latest-cuda
+      service: speaches
+    image: ghcr.io/speaches-ai/speaches:latest-cuda
     build:
       args:
         BASE_IMAGE: nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04

diff --git a/compose.observability.yaml b/compose.observability.yaml
@@ -5,7 +5,7 @@ services:
     volumes:
       - ./configuration/opentelemetry-collector.yaml:/etc/opentelemetry-collector.yaml
     ports:
-      # NOTE: when `faster-whisper-server` is also running as a Docker Compose service, this doesn't need to be exposed.
+      # NOTE: when `speaches` is also running as a Docker Compose service, this doesn't need to be exposed.
       - 4317:4317 # OTLP gRPC receiver
       # - 4318:4318 # OTLP HTTP receiver
       # - 8888:8888 # Prometheus metrics exposed by the Collector

diff --git a/compose.yaml b/compose.yaml
@@ -1,7 +1,7 @@
 # TODO: https://docs.astral.sh/uv/guides/integration/docker/#configuring-watch-with-docker-compose
 services:
-  faster-whisper-server:
-    container_name: faster-whisper-server
+  speaches:
+    container_name: speaches
     build:
       dockerfile: Dockerfile
       context: .

diff --git a/docs/configuration.md b/docs/configuration.md
@@ -1,5 +1,5 @@
 <!-- https://mkdocstrings.github.io/python/usage/configuration/general/ -->
-::: faster_whisper_server.config.Config
+::: speaches.config.Config
     options:
         show_bases: true
         show_if_no_docstring: true
@@ -16,7 +16,7 @@
             - "!speech_*"
             - "!transcription_*"
 
-::: faster_whisper_server.config.WhisperConfig
+::: speaches.config.WhisperConfig
 
 <!-- TODO: nested model `whisper`  -->
 <!-- TODO: Insert new lines for multi-line docstrings  -->
diff --git a/docs/installation.md b/docs/installation.md
@@ -9,25 +9,25 @@ Download the necessary Docker Compose files
 === "CUDA"
 
     ```bash
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
     export COMPOSE_FILE=compose.cuda.yaml
     ```
 
 === "CUDA (with CDI feature enabled)"
 
     ```bash
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda-cdi.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda-cdi.yaml
     export COMPOSE_FILE=compose.cuda-cdi.yaml
     ```
 
 === "CPU"
 
     ```bash
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
-    curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cpu.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
+    curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
     export COMPOSE_FILE=compose.cpu.yaml
     ```
 
@@ -58,10 +58,10 @@ docker compose up --detach
       --rm \
       --detach \
       --publish 8000:8000 \
-      --name faster-whisper-server \
+      --name speaches \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
       --gpus=all \
-      fedirz/faster-whisper-server:latest-cuda
+      ghcr.io/speaches-ai/speaches:latest-cuda
     ```
 
 === "CUDA (with CDI feature enabled)"
@@ -71,10 +71,10 @@ docker compose up --detach
       --rm \
       --detach \
       --publish 8000:8000 \
-      --name faster-whisper-server \
+      --name speaches \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
       --device=nvidia.com/gpu=all \
-      fedirz/faster-whisper-server:latest-cuda
+      ghcr.io/speaches-ai/speaches:latest-cuda
     ```
 
 === "CPU"
@@ -84,31 +84,31 @@ docker compose up --detach
       --rm \
       --detach \
       --publish 8000:8000 \
-      --name faster-whisper-server \
+      --name speaches \
       --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
-      fedirz/faster-whisper-server:latest-cpu
+      ghcr.io/speaches-ai/speaches:latest-cpu
     ```
 
 ??? note "Build from source"
 
     ```bash
-    docker build --tag faster-whisper-server .
+    docker build --tag speaches .
 
     # NOTE: you need to install and enable [buildx](https://github.com/docker/buildx) for multi-platform builds
     # Build image for both amd64 and arm64
-    docker buildx build --tag faster-whisper-server --platform linux/amd64,linux/arm64 .
+    docker buildx build --tag speaches --platform linux/amd64,linux/arm64 .
 
     # Build image without CUDA support
-    docker build --tag faster-whisper-server --build-arg BASE_IMAGE=ubuntu:24.04 .
+    docker build --tag speaches --build-arg BASE_IMAGE=ubuntu:24.04 .
     ```
 
 ## Python (requires Python 3.12+ and `uv` package manager)
 
 ```bash
-git clone https://github.com/fedirz/faster-whisper-server.git
-cd faster-whisper-server
+git clone https://github.com/speaches-ai/speaches.git
+cd speaches
 uv venv
 sourve .venv/bin/activate
 uv sync --all-extras
-uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app
+uvicorn --factory --host 0.0.0.0 speaches.main:create_app
 ```
diff --git a/docs/introduction.md b/docs/introduction.md
@@ -8,19 +8,20 @@
 
 TODO: add HuggingFace Space URL
 
-# Faster Whisper Server
+# Speaches
 
-`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
+`speaches` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
 
 ## Features:
 
 - GPU and CPU support.
 - [Deployable via Docker Compose / Docker](./installation.md)
 - [Highly configurable](./configuration.md)
-- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
+- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `speaches`.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
 - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
+- [Text-to-speech (TTS) via `piper`]
 - (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
   - Generate a spoken audio summary of a body of text (text in, audio out)
   - Perform sentiment analysis on a recording (audio in, text out)
@@ -34,7 +35,7 @@ Please create an issue if you find a bug, have a question, or a feature suggesti
 See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
 
 - Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
-  - Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
+  - Unlike OpenAI's API, `speaches` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
 - Audio file translation via `POST /v1/audio/translations` endpoint.
 - Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
   - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.