docs: update readme

gmickel · Aug 16, 2024 · 4e7b80b · 4e7b80b
1 parent 3a7e7a0
commit 4e7b80b
Showing 1 changed file with 8 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -423,20 +423,23 @@ The benchmark generates a detailed Markdown report including:
 
 Reports are saved in `benchmark/reports/` with timestamped filenames.
 
-For full details on running benchmarks, interpreting results, and available options, please refer to the [Benchmark README](./benchmark/README.md).
-
 ### Results
 
 CodeWhisper's performance has been evaluated across different models using the Exercism Python exercises. Below is a summary of the benchmark results:
 
-| Model                      | Tests Passed | Time (s) | Cost ($) | Command                                              |
-| -------------------------- | ------------ | -------- | -------- | ---------------------------------------------------- |
-| claude-3-5-sonnet-20240620 | 80.27%       | 1619.49  | 3.4000   | `./benchmark/run_benchmark.sh --workers 5 --no-plan` |
+| Model                      | Tests Passed | Time (s) | Cost ($) | Command                                                                        |
+| -------------------------- | ------------ | -------- | -------- | ------------------------------------------------------------------------------ |
+| claude-3-5-sonnet-20240620 | 80.27%       | 1619.49  | 3.4000   | `./benchmark/run_benchmark.sh --workers 5 --no-plan`                           |
+| gpt-4o-2024-08-06          | 81.51%       | 986.68   | 1.6800   | `./benchmark/run_benchmark.sh --workers 5 --no-plan --model gpt-4o-2024-08-06` |
+
+The full reports used to generate these results are available in the `benchmark/reports/` directory.
 
 These results provide insights into the efficiency and accuracy of different models when used with CodeWhisper. The "Tests Passed" percentage indicates the proportion of Exercism tests successfully completed, while the time and cost metrics offer a view of the resource requirements for each model.
 
 As we continue to run benchmarks with various models and configurations, this table will be updated to provide a comprehensive comparison, helping users make informed decisions about which model might best suit their needs.
 
+For full details on running benchmarks, interpreting results, and available options, please refer to the [Benchmark README](./benchmark/README.md).
+
 ## 🤝 Contributing
 
 We welcome contributions to CodeWhisper! Please read our [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.