Skip to content

Commit

Permalink
rename to speaches
Browse files Browse the repository at this point in the history
  • Loading branch information
Fedir Zadniprovskyi authored and fedirz committed Jan 12, 2025
1 parent 9922993 commit 43cc67a
Show file tree
Hide file tree
Showing 45 changed files with 243 additions and 239 deletions.
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
ARG BASE_IMAGE=nvidia/cuda:12.6.3-cudnn-runtime-ubuntu24.04
# hadolint ignore=DL3006
FROM ${BASE_IMAGE}
LABEL org.opencontainers.image.source="https://github.com/fedirz/faster-whisper-server"
LABEL org.opencontainers.image.source="https://github.com/speaches-ai/speaches"
LABEL org.opencontainers.image.licenses="MIT"
# `ffmpeg` is installed because without it `gradio` won't work with mp3(possible others as well) files
# hadolint ignore=DL3008
Expand All @@ -15,7 +15,7 @@ RUN apt-get update && \
USER ubuntu
ENV HOME=/home/ubuntu \
PATH=/home/ubuntu/.local/bin:$PATH
WORKDIR $HOME/faster-whisper-server
WORKDIR $HOME/speaches
# https://docs.astral.sh/uv/guides/integration/docker/#installing-uv
COPY --chown=ubuntu --from=ghcr.io/astral-sh/uv:0.5.14 /uv /bin/uv
# https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers
Expand All @@ -35,7 +35,7 @@ RUN mkdir -p $HOME/.cache/huggingface/hub
ENV WHISPER__MODEL=Systran/faster-whisper-large-v3
ENV UVICORN_HOST=0.0.0.0
ENV UVICORN_PORT=8000
ENV PATH="$HOME/faster-whisper-server/.venv/bin:$PATH"
ENV PATH="$HOME/speaches/.venv/bin:$PATH"
# https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables#hfhubenablehftransfer
# NOTE: I've disabled this because it doesn't inside of Docker container. I couldn't pinpoint the exact reason. This doesn't happen when running the server locally.
# RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
Expand All @@ -44,4 +44,4 @@ ENV HF_HUB_ENABLE_HF_TRANSFER=0
# https://www.reddit.com/r/StableDiffusion/comments/1f6asvd/gradio_sends_ip_address_telemetry_by_default/
ENV DO_NOT_TRACK=1
EXPOSE 8000
CMD ["uvicorn", "--factory", "faster_whisper_server.main:create_app"]
CMD ["uvicorn", "--factory", "speaches.main:create_app"]
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# Faster Whisper Server
> [!NOTE]
> This project was previously named `faster-whisper-server`. I've decided to change the name from `faster-whisper-server`, as the project has evolved to support more than just transcription.
# Speaches

`speaches` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.

`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
Features:

- GPU and CPU support.
- Easily deployable using Docker.
- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
- **Configurable through environment variables (see [config.py](./src/speaches/config.py))**.
- OpenAI API compatible.
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Live transcription support (audio is sent via websocket as it's generated).
Expand All @@ -18,7 +22,7 @@ Please create an issue if you find a bug, have a question, or a feature suggesti
See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.

- Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
- Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
- Unlike OpenAI's API, `speaches` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
- Audio file translation via `POST /v1/audio/translations` endpoint.
- Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
Expand All @@ -35,23 +39,23 @@ See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio)
NOTE: I'm using newer Docker Compsose features. If you are using an older version of Docker Compose, you may need need to update.

```bash
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml

# for GPU support
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
docker compose --file compose.cuda.yaml up --detach
# for CPU only (use this if you don't have a GPU, as the image is much smaller)
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cpu.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
docker compose --file compose.cpu.yaml up --detach
```

### Using Docker

```bash
# for GPU support
docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach fedirz/faster-whisper-server:latest-cuda
docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach ghcr.io/speaches-ai/speaches:latest-cuda
# for CPU only (use this if you don't have a GPU, as the image is much smaller)
docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu
docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach ghcr.io/speaches-ai/speaches:latest-cpu
```

### Using Kubernetes
Expand Down
4 changes: 2 additions & 2 deletions Taskfile.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ version: "3"
tasks:
server:
cmds:
- pkill --signal SIGKILL --echo --full 'uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app' || true
- opentelemetry-instrument uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app {{.CLI_ARGS}}
- pkill --signal SIGKILL --echo --full 'uvicorn --factory --host 0.0.0.0 speaches.main:create_app' || true
- opentelemetry-instrument uvicorn --factory --host 0.0.0.0 speaches.main:create_app {{.CLI_ARGS}}
sources:
- src/**/*.py
test:
Expand Down
6 changes: 3 additions & 3 deletions compose.cpu.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# include:
# - compose.observability.yaml
services:
faster-whisper-server:
speaches:
extends:
file: compose.yaml
service: faster-whisper-server
image: fedirz/faster-whisper-server:latest-cpu
service: speaches
image: ghcr.io/speaches-ai/speaches:latest-cpu
build:
args:
BASE_IMAGE: ubuntu:24.04
Expand Down
4 changes: 2 additions & 2 deletions compose.cuda-cdi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
# https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
# https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices
services:
faster-whisper-server:
speaches:
extends:
file: compose.cuda.yaml
service: faster-whisper-server
service: speaches
volumes:
- hf-hub-cache:/home/ubuntu/.cache/huggingface/hub
deploy:
Expand Down
6 changes: 3 additions & 3 deletions compose.cuda.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# include:
# - compose.observability.yaml
services:
faster-whisper-server:
speaches:
extends:
file: compose.yaml
service: faster-whisper-server
image: fedirz/faster-whisper-server:latest-cuda
service: speaches
image: ghcr.io/speaches-ai/speaches:latest-cuda
build:
args:
BASE_IMAGE: nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04
Expand Down
2 changes: 1 addition & 1 deletion compose.observability.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ services:
volumes:
- ./configuration/opentelemetry-collector.yaml:/etc/opentelemetry-collector.yaml
ports:
# NOTE: when `faster-whisper-server` is also running as a Docker Compose service, this doesn't need to be exposed.
# NOTE: when `speaches` is also running as a Docker Compose service, this doesn't need to be exposed.
- 4317:4317 # OTLP gRPC receiver
# - 4318:4318 # OTLP HTTP receiver
# - 8888:8888 # Prometheus metrics exposed by the Collector
Expand Down
4 changes: 2 additions & 2 deletions compose.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# TODO: https://docs.astral.sh/uv/guides/integration/docker/#configuring-watch-with-docker-compose
services:
faster-whisper-server:
container_name: faster-whisper-server
speaches:
container_name: speaches
build:
dockerfile: Dockerfile
context: .
Expand Down
4 changes: 2 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!-- https://mkdocstrings.github.io/python/usage/configuration/general/ -->
::: faster_whisper_server.config.Config
::: speaches.config.Config
options:
show_bases: true
show_if_no_docstring: true
Expand All @@ -16,7 +16,7 @@
- "!speech_*"
- "!transcription_*"

::: faster_whisper_server.config.WhisperConfig
::: speaches.config.WhisperConfig

<!-- TODO: nested model `whisper` -->
<!-- TODO: Insert new lines for multi-line docstrings -->
38 changes: 19 additions & 19 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,25 @@ Download the necessary Docker Compose files
=== "CUDA"

```bash
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
export COMPOSE_FILE=compose.cuda.yaml
```

=== "CUDA (with CDI feature enabled)"

```bash
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda-cdi.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda-cdi.yaml
export COMPOSE_FILE=compose.cuda-cdi.yaml
```

=== "CPU"

```bash
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cpu.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
export COMPOSE_FILE=compose.cpu.yaml
```

Expand Down Expand Up @@ -58,10 +58,10 @@ docker compose up --detach
--rm \
--detach \
--publish 8000:8000 \
--name faster-whisper-server \
--name speaches \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
--gpus=all \
fedirz/faster-whisper-server:latest-cuda
ghcr.io/speaches-ai/speaches:latest-cuda
```

=== "CUDA (with CDI feature enabled)"
Expand All @@ -71,10 +71,10 @@ docker compose up --detach
--rm \
--detach \
--publish 8000:8000 \
--name faster-whisper-server \
--name speaches \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
--device=nvidia.com/gpu=all \
fedirz/faster-whisper-server:latest-cuda
ghcr.io/speaches-ai/speaches:latest-cuda
```

=== "CPU"
Expand All @@ -84,31 +84,31 @@ docker compose up --detach
--rm \
--detach \
--publish 8000:8000 \
--name faster-whisper-server \
--name speaches \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
fedirz/faster-whisper-server:latest-cpu
ghcr.io/speaches-ai/speaches:latest-cpu
```

??? note "Build from source"

```bash
docker build --tag faster-whisper-server .
docker build --tag speaches .

# NOTE: you need to install and enable [buildx](https://github.com/docker/buildx) for multi-platform builds
# Build image for both amd64 and arm64
docker buildx build --tag faster-whisper-server --platform linux/amd64,linux/arm64 .
docker buildx build --tag speaches --platform linux/amd64,linux/arm64 .

# Build image without CUDA support
docker build --tag faster-whisper-server --build-arg BASE_IMAGE=ubuntu:24.04 .
docker build --tag speaches --build-arg BASE_IMAGE=ubuntu:24.04 .
```

## Python (requires Python 3.12+ and `uv` package manager)

```bash
git clone https://github.com/fedirz/faster-whisper-server.git
cd faster-whisper-server
git clone https://github.com/speaches-ai/speaches.git
cd speaches
uv venv
sourve .venv/bin/activate
uv sync --all-extras
uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app
uvicorn --factory --host 0.0.0.0 speaches.main:create_app
```
9 changes: 5 additions & 4 deletions docs/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,20 @@

TODO: add HuggingFace Space URL

# Faster Whisper Server
# Speaches

`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
`speaches` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.

## Features:

- GPU and CPU support.
- [Deployable via Docker Compose / Docker](./installation.md)
- [Highly configurable](./configuration.md)
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `speaches`.
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Live transcription support (audio is sent via websocket as it's generated).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
- [Text-to-speech (TTS) via `piper`]
- (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
Expand All @@ -34,7 +35,7 @@ Please create an issue if you find a bug, have a question, or a feature suggesti
See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.

- Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
- Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
- Unlike OpenAI's API, `speaches` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
- Audio file translation via `POST /v1/audio/translations` endpoint.
- Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
Expand Down
Loading

0 comments on commit 43cc67a

Please sign in to comment.