From 4e7b80bf3ecc48ee42a65d1c2a2013145f7d9413 Mon Sep 17 00:00:00 2001 From: Gordon Mickel Date: Fri, 16 Aug 2024 09:55:34 +0200 Subject: [PATCH] docs: update readme --- README.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 0c5480a..47f3a2b 100644 --- a/README.md +++ b/README.md @@ -423,20 +423,23 @@ The benchmark generates a detailed Markdown report including: Reports are saved in `benchmark/reports/` with timestamped filenames. -For full details on running benchmarks, interpreting results, and available options, please refer to the [Benchmark README](./benchmark/README.md). - ### Results CodeWhisper's performance has been evaluated across different models using the Exercism Python exercises. Below is a summary of the benchmark results: -| Model | Tests Passed | Time (s) | Cost ($) | Command | -| -------------------------- | ------------ | -------- | -------- | ---------------------------------------------------- | -| claude-3-5-sonnet-20240620 | 80.27% | 1619.49 | 3.4000 | `./benchmark/run_benchmark.sh --workers 5 --no-plan` | +| Model | Tests Passed | Time (s) | Cost ($) | Command | +| -------------------------- | ------------ | -------- | -------- | ------------------------------------------------------------------------------ | +| claude-3-5-sonnet-20240620 | 80.27% | 1619.49 | 3.4000 | `./benchmark/run_benchmark.sh --workers 5 --no-plan` | +| gpt-4o-2024-08-06 | 81.51% | 986.68 | 1.6800 | `./benchmark/run_benchmark.sh --workers 5 --no-plan --model gpt-4o-2024-08-06` | + +The full reports used to generate these results are available in the `benchmark/reports/` directory. These results provide insights into the efficiency and accuracy of different models when used with CodeWhisper. The "Tests Passed" percentage indicates the proportion of Exercism tests successfully completed, while the time and cost metrics offer a view of the resource requirements for each model. As we continue to run benchmarks with various models and configurations, this table will be updated to provide a comprehensive comparison, helping users make informed decisions about which model might best suit their needs. +For full details on running benchmarks, interpreting results, and available options, please refer to the [Benchmark README](./benchmark/README.md). + ## 🤝 Contributing We welcome contributions to CodeWhisper! Please read our [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.