Model Predictive Path Integral (MPPI) Control in C++
This article is a continuation of my previous discussion on Model Predictive Controllers (MPC).
The source code you can find on my GitHub.
1. Model Predictive Control (MPC) Background:
MPC (consider my previous article) is a control strategy that computes control actions by solving an optimization problem at each time step. The optimization problem involves predicting the future behavior of the system over a finite horizon and finding the control sequence that minimizes a certain cost function.
Remember. In MPC, at each time step, we solve an optimization problem over a finite prediction horizon to obtain a sequence of control inputs. However, only the first control input of this sequence is applied to the system. Then, at the next time step, the optimization is solved again (taking into account the new state of the system), and again, only the first control input is applied. This “receding horizon” approach is a fundamental characteristic of MPC.
2. Model Predictive Path Integral (MPPI) Controller:
MPPI is a variant of MPC that uses stochastic optimization to compute the control actions. Instead of solving a deterministic optimization problem,
At MPPI, we consider two primary features that affect the controller's performance.
- Trajectory refers to the sequence of states that a system passes through over time, given a sequence of control inputs. In the context of MPPI, multiple trajectories are sampled to explore different possible future behaviors of the system. In MPPI, multiple such trajectories are sampled to explore the state and control space. The idea is to evaluate the cost associated with each trajectory and then use this information to determine the optimal control action.
- The horizon, often referred to as the prediction or planning horizon, is the number of time steps over which future trajectories are considered in the optimization problem. It defines how far into the future the controller looks when making decisions.
The optimization problem in MPPI aims to find the sequence of control actions over this horizon that minimizes the expected cost.
The choice of horizon T is crucial:
If the horizon is too short, the controller might not have enough foresight to make good decisions, especially in scenarios where actions have long-term consequences.
If the horizon is too long, the computational complexity can become expensive, especially since MPPI involves sampling multiple trajectories. Additionally, predictions far into the future might be less accurate due to uncertainties in the system dynamics or disturbances.
MPPI samples multiple control trajectories and computes the expected cost for each trajectory. The control action is then selected based on the weighted average of these trajectories.
The main idea behind MPPI is to use the path integral formulation of stochastic optimal control, which relates the expected cost of a trajectory to its probability.
Mathematically, the MPPI control action is given by:
Objective Function: Given a system with state x and control u, the objective is to minimize the expected cost over a finite horizon T:
where c(xt,ut) is the instantaneous cost at time t.
Stochastic Dynamics: The system dynamics are given by:
where wt is a zero-mean Gaussian noise with covariance Σ.
Sampling: At each time step, K control trajectories,
are sampled from a Gaussian distribution.
Cost Evaluation: The cost of each sampled trajectory is computed using the system dynamics and the cost function:
The cost function c(xt,ut) can be chosen based on the desired behavior. For example, one might choose a quadratic cost function that penalizes deviations from the upright position and large control inputs:
where Q and R are positive definite matrices.
Weighted Combination: The optimal control input is computed as a weighted average of the sampled control trajectories:
where the weights ωk are given by:
Here, λ is a temperature parameter that determines the sharpness of the weighting.
Cart-Pole System:
For the cart-pole system, the state x is defined as:
The control input u is the force applied to the cart.
The dynamics of the cart-pole system can be derived using Newton’s laws and are given by:
Here is the simple simulation of the MPPI controller for the initial values you can change,
Thank you for reading