Fastembed API

Overview

Fastembed API provides a straightforward way to generate text embeddings for paragraphs, making it easier to retrieve vector embeddings for downstream tasks like semantic search or text similarity.

Context

It was developed as a companyon of Boc·ajarro which, as is written in PHP, needed some Python help in order to create text embeddings in order to feed a vector database.

As it's meant to be executed in a server without a graphic card, it doesn't rely on expensive (in terms of computation) Pytorch setups. Instead, it uses FastEmbed so it can be executed in the CPU's of a standard Virtual Private Server.

Limitations

Currently, the API supports multilingual embeddings using the model sentence-transformers/paraphrase-multilingual-mpnet-base-v2 and accepts text input one paragraph at a time.
As it's meant to be executed in a private network, it does not implement any authentication (or authorization) mechanism.

Installation

To get started, clone the repository:

git clone https://github.com/estudio-hawara/fastembed-api
cd fastembed-api

Local Environment

For a local setup using a virtual environment:

# 1. Create a Python virtual environment
python -m venv .venv

# 2. Activate the Python virtual environment
source .venv/bin/activate

# 3. Install the dependencies
pip install -r requirements.txt

# 4. Start the service
uvicorn main:app

The API should now be available at http://localhost:8000.

Docker Environment

To use Docker, follow these steps:

# 1. Build the Docker Image
docker compose build

# 2. Start the Service as a Background Container
docker compose up -d

The API should now be available at http://localhost:8000.

Endpoints

`/embed` Endpoint

Method: POST
Description: Accepts a paragraph of text and returns vector embeddings for that text.
Request:
- Content-Type: application/json
- JSON Body: { "paragraph": "Your paragraph here" }
Response:
- JSON Body: { "embeddings": [ ... ] }
- The embeddings attribute contains an array with the embedding vectors.

`/health` Endpoint

Method: GET
Description: Returns a 200 status if the service is up and ready.

Health Check in Docker Compose

The docker-compose.yml file includes the following healthcheck configuration to ensure the service is fully initialized:

healthcheck:
  test: curl --fail http://localhost:8000/health || exit 1
  interval: 2s
  timeout: 5s
  retries: 3
  start_period: 5s

This configuration ensures that Docker waits until the service is ready before marking it as "healthy."

Persistent Docker Volume

This project caches FastEmbed files in a temporary directory for quicker container restarts. The .onnx model and related files are stored in a Docker volume as specified in docker-compose.yml:

volumes:
  - fastembed_cache:/tmp/fastembed_cache

Using this volume prevents repeated downloads of model files, speeding up restarts and preserving cache data between runs.

Requirements

Python: 3.12 (>=3.12 and <3.13)
Docker: No specific version required.

Testing

To run unit tests:

pytest

These tests ensure basic functionality. Benchmark tests will be added in future updates.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fastembed API

Overview

Context

Limitations

Installation

Local Environment

Docker Environment

Endpoints

`/embed` Endpoint

`/health` Endpoint

Health Check in Docker Compose

Persistent Docker Volume

Requirements

Testing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fastembed API

Overview

Context

Limitations

Installation

Local Environment

Docker Environment

Endpoints

/embed Endpoint

/health Endpoint

Health Check in Docker Compose

Persistent Docker Volume

Requirements

Testing

License

`/embed` Endpoint

`/health` Endpoint