PendulumRL

Pendulum has torque control at its pivot. The objective is to stabilize the pendulum in upright position.

The action space of pendulum is a continuous torque between (-2.0, 2.0)

The reward function for the Pendulum motion is: r = -(theta^2 + 0.1 * theta_dt^2 + 0.001 * torque^2)

The observation space of the Pendulum has following states: 0: x = cos(theta) 1: y = sin(theta) 2: Angular velocity

Q-Learning Algorithm:

For Q-Learning using Q-Table, the action space is discretized into 1000 torque steps.

The Q_table is initialized with random q-values.

The robot applies torque corresponding to its current state with maximum q-value.

The q-value is updated as based on discounted q-value of the next state.

The process continues for multiple episodes and the q-table gets updated to find minimum number of steps to reach the termination state.

Provide feedback