This repository contains scripts for benchmarking the performance of large language models (LLMs) served using vLLM. It's designed to test the scalability and performance of LLM deployments under various concurrency levels.
- Benchmark LLMs with different concurrency levels
- Measure key performance metrics:
- Requests per second
- Latency
- Tokens per second
- Time to first token
- Easy to run with customizable parameters
- Generates JSON output for further analysis or visualization
- Python 3.7+
openai
Python packagenumpy
Python package
-
Clone this repository:
git clone https://github.com/yourusername/vllm-benchmark.git cd vllm-benchmark
-
Install the required packages:
pip install openai numpy
To run a single benchmark:
python vllm_benchmark.py --num_requests 100 --concurrency 10 --output_tokens 100 --vllm_url "http://localhost:8000/v1" --api_key "your-api-key"
Parameters:
num_requests
: Total number of requests to makeconcurrency
: Number of concurrent requestsoutput_tokens
: Number of tokens to generate per requestvllm_url
: URL of the vLLM serverapi_key
: API key for the vLLM serverrequest_timeout
: (Optional) Timeout for each request in seconds (default: 30)
To run multiple benchmarks with different concurrency levels:
python run_benchmarks.py --vllm_url "http://localhost:8000/v1" --api_key "your-api-key"
This script will run benchmarks with concurrency levels of 1, 10, 50, and 100, and save the results to benchmark_results.json
.
The benchmark results are saved in JSON format, containing detailed metrics for each run, including:
- Total requests and successful requests
- Requests per second
- Total output tokens
- Latency (average, p50, p95, p99)
- Tokens per second (average, p50, p95, p99)
- Time to first token (average, p50, p95, p99)
Please see the results directory for benchmarks on Backprop instances.
Contributions to improve the benchmarking scripts or add new features are welcome! Please feel free to submit pull requests or open issues for any bugs or feature requests.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.