The continuous time optimal control problem is defined as:
subject to:
- 📏 This is an infinite-dimensional problem. If the sample time becomes infinitely small, then ( u ) will have infinite dimensions.
- 🔄 Solutions are open-loop ( u(t) ):
- 🎛 MPC uses the first few steps of open-loop ( u(t) ) (which does not apply in stochastic control).
- 🔄 Other methods use offline solutions with feedback.
- ❓ Why optimize ( x ) and ( u ) at the same time instead of just ( x )?
- 🔄 When using forward simulation to get ( x_n ), the conditional number of ( A^N ) could be bad, making the optimization problem ill-posed.
- 🤔 It's challenging to get a right ( u(t) ) initially. Providing a better initial ( x(t) ) aids the optimization process.
The discrete-time optimal control problem is:
subject to:
- 📊 This is a finite-dimensional problem.
- 📍 ( x_n, u_n ) are called knot points.
Also known as the maximum principle.
- 📚 Provides first-order necessary conditions for deterministic optimal control problems.
- 🕑 In discrete time, it's just a special case of the KKT conditions.
- 📈 Can be seen as a generalization of the Euler-Lagrange equation for systems with control inputs.
The Lagrangian is:
The Hamiltonian is:
To solve it:
For ( u ), considering torque limits:
The system dynamics and co-trajectory are:
❓ Why is ( u_k ) the argmin of ( H )?
The system dynamics in continuous time are:
- 🚀 Start with an initial guess trajectory.
- 🔄 Simulate (or "rollout") to get ( x(t) ) using ( f(x,u) ).
- ⏪ Backward pass to get ( \lambda(t) ) and ( \Delta u(t) ) using the second equation.
- 🔄 Rollout with line search on ( \Delta u ) by optimizing the Hamiltonian.
- 🔁 Repeat from step 3 until convergence.
⚠️ Limitations: Not robust and can converge slowly.
- 📜 Historically, many algorithms were based on forward/backward integration of the continuous ODEs for ( x(t) ) and ( \lambda(t) ) to perform gradient descent on ( u(t) ).
- 📏 These are called indirect and/or shooting methods (both are iterative methods).
- 🕐 In continuous time, ( \lambda(t) ) is called the "co-state" trajectory.
- 💻 These methods have largely fallen out of favor as computers and solvers have improved.
In the realm of optimal control, understanding the nuances of different algorithms and their applications is crucial. Here's a comprehensive breakdown:
Linear or local control problems can be categorized based on the presence or absence of constraints:
For problems without constraints, the Linear Quadratic Regulator (LQR) is typically used.
- Time Variant (e.g., tracking): This is referred to as Time Variant LQR (TVLQR).
- Solution Approach: Differential Dynamic Programming (DDP) or iterative LQR (iLQR) can be employed.
- Time Invariant (e.g., stabilization): Known as Time Invariant LQR (TILQR).
- Solver Options: Quadratic Programming (QP) or Dynamic Programming (DP) can be used. Both are equivalent in this context.
Model Predictive Control (MPC) is the go-to for problems with constraints.
- Linear Constraints: Quadratic Programming (QP) is the solver of choice.
- Conic Constraints: Second Order Cone Programming (SOCP) is used.
- Non-linear Constraints: A non-linear optimizer is typically employed.
The question arises: Can this be converted into TVLQR? Let's delve deeper:
Two primary methods are used for non-linear trajectory optimization:
- Direct Collocation (DIRCOL): A direct method.
- Differential Dynamic Programming (DDP) or iterative LQR (iLQR): An indirect method.
Both methods are designed to tackle similar problems, but they have distinct characteristics:
Terms | DIRCOL | DDP/iLQR |
---|---|---|
Dynamic Feasibility | Only satisfied when converged. | Always maintained with rollout, allowing for early stopping and deployment. |
Warm Start | Can use an initial guess. | Cannot use a warm start. |
Dynamic Constraints | Can handle any constraints, depending on the backend solver. | Handles constraints with modifications: For u : Use squashing or constrained QP. For x : Add extra cost or use augmented Lagrangian. |
Output | Produces only an open-loop trajectory, requiring an additional controller. | The converged feedback term provides a built-in controller. |
Speed | Slower as it solves the entire problem at once. | Faster due to simplified subproblems. |
Implementation Difficulty | Harder to implement. | Easier to implement. |
Numeric Stability | Robust. | Long horizons can lead to ill-conditioning. |
- Online/Real-time Control: DDP is preferable since speed is paramount and constraint tolerance can be slightly relaxed.
- Offline Trajectory Generation & Long-horizon Problems: DIRCOL is more suitable.
- Multi-shooting Approach: Use DDP for subtrajectory rollouts (with reduced horizons) and combine them with constraints solved by DIRCOL (simplifying the problem).
In conclusion, the choice of algorithm largely depends on the specific requirements and constraints of the problem at hand. By understanding the strengths and weaknesses of each method, one can make informed decisions in the realm of optimal control.