Is there such a thing as too dense spacing? #65

NDR008 · 2023-10-14T17:15:23Z

NDR008
Oct 14, 2023

So I am trying to train my agent at drag-racing (something I managed to do previously).

Ironically, the main difference is that I had previously only made a simple reward function
reward = observed_speed * 1 (if going forward),
reward = observed_speed * -2 (if going backwards).

I have now I generated equally spaced check-points along the track, but I am wondering if they are too sparse/dense (I almost always get a reward in between observations so long as the car is moving forward, wonder if I am missing something else).

yannbouteiller · 2023-10-14T18:46:10Z

yannbouteiller
Oct 14, 2023
Maintainer

For this I don't think it can be to dense unless you have a bug somewhere, but you have to normalize the sum. Ideally you want your time-step-wise reward to be between -1 and +1

0 replies

NDR008 · 2023-10-14T20:03:46Z

NDR008
Oct 14, 2023
Author

I did have a major bug, but after still not working well... grrrrr frustrating
Currently all my parameters and rewards are not clipped/normalised... (but I never had).

2 replies

NDR008 Oct 15, 2023
Author

Relief, I could republicate my old results.
Needed way more training time.

It is weird though. For a dead race:

Initial training: car goes zig-zag and crashes into a wall.
Middle training: Donuts
Finally: Goes straight

yannbouteiller Oct 15, 2023
Maintainer

The donut part is strange, sounds like a weird attempt of reward hacking?

NDR008 · 2023-10-15T19:45:25Z

NDR008
Oct 15, 2023
Author

I think the donut is unintentional. The car gets into a spin after hitting the wall and it is hard to break the spin. Also my reward is negative and larger for facing backwards. Also now trying PPO with CNN. how long did it take to train tmrl with CNN with 1 worker?

…

On Sun, 15 Oct 2023, 19:52 Yann Bouteiller, ***@***.***> wrote: The donut part is strange, sounds like a weird attempt of reward hacking? — Reply to this email directly, view it on GitHub <#65 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKWRWTWO4CNXSNDGPMHOOTX7QPGJANCNFSM6AAAAAA6AMGJW4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

yannbouteiller Oct 15, 2023
Maintainer

Training itself took 30 hours, but hyperparameter search took much longer. If your hyperparameters are good, you should start seeing something sensible after 3 hours.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there such a thing as too dense spacing? #65

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Is there such a thing as too dense spacing? #65

NDR008 Oct 14, 2023

Replies: 3 comments · 3 replies

yannbouteiller Oct 14, 2023 Maintainer

NDR008 Oct 14, 2023 Author

NDR008 Oct 15, 2023 Author

yannbouteiller Oct 15, 2023 Maintainer

NDR008 Oct 15, 2023 Author

yannbouteiller Oct 15, 2023 Maintainer

NDR008
Oct 14, 2023

Replies: 3 comments 3 replies

yannbouteiller
Oct 14, 2023
Maintainer

NDR008
Oct 14, 2023
Author

NDR008 Oct 15, 2023
Author

yannbouteiller Oct 15, 2023
Maintainer

NDR008
Oct 15, 2023
Author

yannbouteiller Oct 15, 2023
Maintainer