Skip to content

Commit

Permalink
docs: update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
gmickel committed Aug 16, 2024
1 parent 3a7e7a0 commit 4e7b80b
Showing 1 changed file with 8 additions and 5 deletions.
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -423,20 +423,23 @@ The benchmark generates a detailed Markdown report including:

Reports are saved in `benchmark/reports/` with timestamped filenames.

For full details on running benchmarks, interpreting results, and available options, please refer to the [Benchmark README](./benchmark/README.md).

### Results

CodeWhisper's performance has been evaluated across different models using the Exercism Python exercises. Below is a summary of the benchmark results:

| Model | Tests Passed | Time (s) | Cost ($) | Command |
| -------------------------- | ------------ | -------- | -------- | ---------------------------------------------------- |
| claude-3-5-sonnet-20240620 | 80.27% | 1619.49 | 3.4000 | `./benchmark/run_benchmark.sh --workers 5 --no-plan` |
| Model | Tests Passed | Time (s) | Cost ($) | Command |
| -------------------------- | ------------ | -------- | -------- | ------------------------------------------------------------------------------ |
| claude-3-5-sonnet-20240620 | 80.27% | 1619.49 | 3.4000 | `./benchmark/run_benchmark.sh --workers 5 --no-plan` |
| gpt-4o-2024-08-06 | 81.51% | 986.68 | 1.6800 | `./benchmark/run_benchmark.sh --workers 5 --no-plan --model gpt-4o-2024-08-06` |

The full reports used to generate these results are available in the `benchmark/reports/` directory.

These results provide insights into the efficiency and accuracy of different models when used with CodeWhisper. The "Tests Passed" percentage indicates the proportion of Exercism tests successfully completed, while the time and cost metrics offer a view of the resource requirements for each model.

As we continue to run benchmarks with various models and configurations, this table will be updated to provide a comprehensive comparison, helping users make informed decisions about which model might best suit their needs.

For full details on running benchmarks, interpreting results, and available options, please refer to the [Benchmark README](./benchmark/README.md).

## 🤝 Contributing

We welcome contributions to CodeWhisper! Please read our [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.
Expand Down

0 comments on commit 4e7b80b

Please sign in to comment.