Skip to content

Commit

Permalink
docs: add deepseek-coder benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
gmickel committed Aug 16, 2024
1 parent 89e7681 commit e937cd0
Show file tree
Hide file tree
Showing 4 changed files with 1,289 additions and 1 deletion.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,11 @@ CodeWhisper's performance has been evaluated across different models using the E
| -------------------------- | ------------ | -------- | -------- | ------------------------------------------------------------------------------ |
| claude-3-5-sonnet-20240620 | 80.27% | 1619.49 | 3.4000 | `./benchmark/run_benchmark.sh --workers 5 --no-plan` |
| gpt-4o-2024-08-06 | 81.51% | 986.68 | 1.6800 | `./benchmark/run_benchmark.sh --workers 5 --no-plan --model gpt-4o-2024-08-06` |
| deepseek-coder | 76.89% | 5850.58 | 0.0000\* | `./benchmark/run_benchmark.sh --workers 5 --no-plan --model deepseek-coder` |

\*The cost calculation was not working properly for this benchmark run.

> **Note:** All benchmarks are one-shot only, unlike other benchmarks which use multiple generations that depend on the results of the test run.
The full reports used to generate these results are available in the `benchmark/reports/` directory.

Expand Down
3 changes: 3 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ CodeWhisper's performance has been evaluated across different models using the E
| -------------------------- | ------------ | -------- | -------- | ------------------------------------------------------------------------------ |
| claude-3-5-sonnet-20240620 | 80.27% | 1619.49 | 3.4000 | `./benchmark/run_benchmark.sh --workers 5 --no-plan` |
| gpt-4o-2024-08-06 | 81.51% | 986.68 | 1.6800 | `./benchmark/run_benchmark.sh --workers 5 --no-plan --model gpt-4o-2024-08-06` |
| deepseek-coder | 76.89% | 5850.58 | 0.0000\* | `./benchmark/run_benchmark.sh --workers 5 --no-plan --model deepseek-coder` |

\*The cost calculation was not working properly for this benchmark run.

> **Note:** All benchmarks are one-shot only, unlike other benchmarks which use multiple generations that depend on the results of the test run.
Expand Down
Loading

0 comments on commit e937cd0

Please sign in to comment.