Replies: 3 comments 3 replies
-
For this I don't think it can be to dense unless you have a bug somewhere, but you have to normalize the sum. Ideally you want your time-step-wise reward to be between -1 and +1 |
Beta Was this translation helpful? Give feedback.
0 replies
-
I did have a major bug, but after still not working well... grrrrr frustrating |
Beta Was this translation helpful? Give feedback.
2 replies
-
I think the donut is unintentional.
The car gets into a spin after hitting the wall and it is hard to break the
spin.
Also my reward is negative and larger for facing backwards.
Also now trying PPO with CNN.
how long did it take to train tmrl with CNN with 1 worker?
…On Sun, 15 Oct 2023, 19:52 Yann Bouteiller, ***@***.***> wrote:
The donut part is strange, sounds like a weird attempt of reward hacking?
—
Reply to this email directly, view it on GitHub
<#65 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIKWRWTWO4CNXSNDGPMHOOTX7QPGJANCNFSM6AAAAAA6AMGJW4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So I am trying to train my agent at drag-racing (something I managed to do previously).
Ironically, the main difference is that I had previously only made a simple reward function
reward = observed_speed * 1 (if going forward),
reward = observed_speed * -2 (if going backwards).
I have now I generated equally spaced check-points along the track, but I am wondering if they are too sparse/dense (I almost always get a reward in between observations so long as the car is moving forward, wonder if I am missing something else).
Beta Was this translation helpful? Give feedback.
All reactions