Linear Quadratic Regulator (LQR) provides a framework for modeling dynamic systems and optimizing their behavior over time.
The LQR problem is formulated as:
subject to:
Key Insights:
- LQR can (locally) approximate many non-linear problems, making it a widely used approach.
- There are numerous extensions of LQR, such as the infinite horizon case and stochastic LQR.
Using the Hamiltonian definition, we derive:
Procedure for LQR with Indirect Shooting:
- Start with an initial guess trajectory.
- Simulate (or "rollout") to obtain
$x(t)$ . - Perform a backward pass to compute
$\lambda(t)$ and$\Delta u(t)$ . - Rollout with a line search on
$\Delta u$ . - Repeat step (3) until convergence.
The standard QP is given by:
For the dynamic case, we can represent:
Constraints:
The Lagrangian of this QP is:
From the KKT conditions:
Solving this KKT system yields:
The QP KKT system is notably sparse and structured, leading to the Riccati equation/recursion:
From which we can derive:
This provides a feedback policy.
For time-invariant LQR problems:
- The
$K$ matrices converge to constant values over an infinite horizon. - For stabilization tasks, the constant
$K$ is predominantly used. - This can be viewed as a root-finding or fixed point problem using Newton’s method, where
$P_n=P_{n+1}=P_{\inf}$ . - This can be explicitly solved using tools like Julia or Matlab with the
dare
function.
-
🌟 Principle of Time Dependency: Past control inputs can influence future states, but future control inputs cannot alter past states.
-
🌟 Bellman's Principle of Optimality: This principle is a direct consequence of the time dependency. It states that sub-trajectories of optimal trajectories must also be optimal for the corresponding sub-problem.
-
🔄 Backward Recursive Programming: This is a method used to determine the optimal solution.
-
📈 Optimal Cost-to-Go (Value Function): Denoted as
$V_N(x)$ .
-
Last Step Value: $$ V_N(x) = \frac{1}{2} x^T Q_N x = \frac{1}{2} x^T P_N x $$
-
Backup One Step: $$ V_{N-1} = \min_u \frac{1}{2} x_{N-1}^T Q x_{N-1} + \frac{1}{2} u^T R u + V_N (A_{N-1} x_{N-1} + B_{N-1} u) $$
-
Optimal Control: $$ u_{N-1} = -K_{N-1} x_{N-1} $$ where $$ K_{N-1} = (R_{N-1} + B_{N-1}^T Q_N B_{N-1})^{-1} B_{N-1}^T Q_N A_{N-1} $$
-
Define the Matrix
$P_{N-1}$ : $$ P_{N-1} = (Q_{N-1} + K_{N-1}^T R_{N-1} K + (A_{N-1} - B_{N-1} K_{N-1})^T Q_N (A_{N-1} - B_{N-1} K_{N-1}) ) $$ -
Value Function: $$ V_{N-1}(x) = \frac{1}{2} x^T P_{N-1} x $$
-
🔄 Recursive Solution: This process can be repeated to find the solution recursively.
-
-
✅ Global Optimum: Dynamic Programming (DP) guarantees a global optimum.
-
⚠️ Limitations: DP is only feasible for simpler problems, such as LQR problems or those with low dimensions. -
📊 Value Function Complexity: For LQR problems,
$V(x)$ remains quadratic. However, for even slightly non-linear problems, it becomes challenging to represent analytically. -
🚫 Non-Convexity: Even if an analytical representation is possible, the
$\min_u S(x,u)$ can be non-convex, making it difficult to solve. -
🚀 Deep Reinforcement Learning (Deep RL): The computational cost of DP increases with state dimension due to the complexity of representing
$V(x)$ . This is where Deep RL comes into play, approximating V and Q with other functions and releasing the need for an explicit system model. -
🎲 Stochastic Form: DP can also be generalized to handle stochastic forms, making value function approximation possible.
- 📌 Dynamic Lagrange Multipliers (Co-Trajectory): These represent the gradient of the cost-to-go and can be applied to non-linear cases as well.