From bb10d07b270b0b2c6260e0ed9542bd6e68edcb55 Mon Sep 17 00:00:00 2001 From: Garvit Singh Rathore <78960005+garvit000@users.noreply.github.com> Date: Thu, 30 Jan 2025 23:12:58 +0530 Subject: [PATCH 1/2] Update README.md Used more accurate words. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c6ea85a..3f93239 100644 --- a/README.md +++ b/README.md @@ -202,7 +202,7 @@ python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B **We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:** -1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs. +1. Set the temperature between 0.5 and 0.7 (with 0.6 recommended) to prevent endless repetition or incoherent outputs. 2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.** 3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}." 4. When evaluating model performance, it is recommended to conduct multiple tests and average the results. From 1c10c9f6771fadcd5c8fe52cbac7075527184661 Mon Sep 17 00:00:00 2001 From: Garvit Singh Rathore <78960005+garvit000@users.noreply.github.com> Date: Thu, 30 Jan 2025 23:17:27 +0530 Subject: [PATCH 2/2] Update README.md Behaviors are exhibited rather than emerged. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3f93239..5664b98 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,7 @@ We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. -With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. +With RL, DeepSeek-R1-Zero naturally exhibited numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.