-
Feedback
- Feedback is a fundamental concept in control systems. It refers to the process of adjusting the input based on the output to achieve desired behavior.
- However, feedback alone might not be sufficient, especially when constraints are tight or when the system is complex.
-
Improving the Model
- Parameter Estimation:
- Methods: System Identification (SystemID), Grey Box Modeling.
- 🟢 Advantages: Sample efficient and generalizes well.
- 🔴 Disadvantages: Assumes a good model structure.
- Learning the Model:
- Methods: Fit the model function or use residue black-box.
- 🟢 Advantages: No assumption on model structure and generalizes well.
- 🔴 Disadvantages: Sample inefficient and challenging to implement in real-world scenarios.
- Parameter Estimation:
-
Improving the Controller
- Learning the Policy:
- 🟢 Advantages: No assumptions on dynamics.
- 🔴 Disadvantages: Suited for single tasks, doesn't generalize well, and is sample inefficient.
- Improving a Trajectory:
- Given a reference trajectory with a nominal model, feedback the real trajectory to the optimizer.
- 🟢 Advantages: Makes few assumptions and is sample efficient.
- 🔴 Disadvantages: Assumes a decent model and doesn't generalize well.
- Learning the Policy:
Iterative Learning Control (ILC) is a fascinating area of optimal control that leverages iterative methods to improve control policies based on previous experiences. This article delves into the mathematical foundations of ILC and its practical implications.
-
Definition: ILC is essentially a special case of the policy gradient on a policy class. The control policy is represented as:
$$ u_k = \bar{u}_k - K_k(x_k - \bar{x}_k) $$ where the latter can be any controller.
-
Relation to SQP: ILC can also be seen as a special case of the Sequential Quadratic Programming (SQP) method. This is evident from the way the right-hand side vector originates from the system rollout.
Consider the following optimization problem for tracking:
$$ \begin{align*} \min_{x_{1:N}, u_{1:N}} & J = \sum_{n=1}^{N-1} \left[ \frac{1}{2} (x_n - \bar{x}_n)^T Q (x_n - \bar{x}_n) + \frac{1}{2} (u_n - \bar{u}_n)^T R (u_n - \bar{u}_n) \right] \ & + \frac{1}{2} (x_N - \bar{x}N)^T Q_N (x_N - \bar{x}N) \ \text{s.t.} & x{n+1} = f{\text{nominal}}(x_n) \end{align*} $$
📝 Notes:
- In a standard formulation, ( f_{\text{nominal}} ) should be ( f_{\text{real}} ). However, we often don't have access to ( f_{\text{real}} ). The idea is to approximate the gradient of dynamics.
- The underlying concept is to replace the model-based rollout trajectory with the real rollout trajectory. A limitation in real-world scenarios is the inability to obtain the gradient, prompting the use of the original model as an approximation.
The Lagrangian for the problem is:
The gradient of the Lagrangian with respect to ( x ) and ( \lambda ) can be approximated as:
The Hessian matrix of the Karush-Kuhn-Tucker (KKT) condition is:
$$ \begin{align*} \begin{bmatrix} H & \frac{\partial c(z)}{\partial z}^T \ \frac{\partial c(z)}{\partial z} & 0 \end{bmatrix} \begin{bmatrix} \Delta z \ \lambda \end{bmatrix}
\begin{bmatrix} -\nabla_x J \ -c(z) \end{bmatrix} \end{align*} $$
Where:
And:
📝 Key Observations:
- ( c(z) = 0 ) is always satisfied with real-world data.
- Given ( x_n, u_n ), we can still obtain the gradient of ( J ).
- This allows us to obtain the entire right-hand side of the KKT condition.
- In practice, since ( x_n, u_n ) are already close to the reference trajectory by offline solving, we can compute ( C = \frac{\partial c}{\partial q} |_{\bar{x},\bar{u}} ).
- We can then solve the KKT system for ( \Delta z ) and update ( \bar{u} ) using a Quadratic Programming (QP) approach.
📝 Note: During the rollout, we might use Linear Quadratic Regulator (LQR) to track the planned open-loop trajectory.
ILC operates as an approximation of Newton's method. It's a form of inexact or quasi-Newton method. The beauty of ILC is that it allows for Newton-style optimization even if we don't have the exact gradient. This method will converge, albeit potentially slower, as long as the approximation satisfies:
Where ( \eta < 0 ).
With the above insights, it's evident that Iterative Learning Control offers a robust framework for refining control policies iteratively, making it a valuable tool in the realm of optimal control.