Table of Contents
Fetching ...

Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control

Haonan He, Yuheng Qiu, Junyi Geng

TL;DR

This work presents a self-supervised learning framework combining learning-based inertial odometry module and differentiable model predictive control (d-MPC) for Unmanned Aerial Vehicle (UAV) attitude control that can simultaneously enhance both the MPC parameter learning and IMU prediction performance.

Abstract

Modeling and control of nonlinear dynamics are critical in robotics, especially in scenarios with unpredictable external influences and complex dynamics. Traditional cascaded modular control pipelines often yield suboptimal performance due to conservative assumptions and tedious parameter tuning. Pure data-driven approaches promise robust performance but suffer from low sample efficiency, sim-to-real gaps, and reliance on extensive datasets. Hybrid methods combining learning-based and traditional model-based control in an end-to-end manner offer a promising alternative. This work presents a self-supervised learning framework combining learning-based inertial odometry (IO) module and differentiable model predictive control (d-MPC) for Unmanned Aerial Vehicle (UAV) attitude control. The IO denoises raw IMU measurements and predicts UAV attitudes, which are then optimized by MPC for control actions in a bi-level optimization (BLO) setup, where the inner MPC optimizes control actions and the upper level minimizes discrepancy between real-world and predicted performance. The framework is thus end-to-end and can be trained in a self-supervised manner. This approach combines the strength of learning-based perception with the interpretable model-based control. Results show the effectiveness even under strong wind. It can simultaneously enhance both the MPC parameter learning and IMU prediction performance.

Imperative MPC: An End-to-End Self-Supervised Learning with Differentiable MPC for UAV Attitude Control

TL;DR

This work presents a self-supervised learning framework combining learning-based inertial odometry module and differentiable model predictive control (d-MPC) for Unmanned Aerial Vehicle (UAV) attitude control that can simultaneously enhance both the MPC parameter learning and IMU prediction performance.

Abstract

Modeling and control of nonlinear dynamics are critical in robotics, especially in scenarios with unpredictable external influences and complex dynamics. Traditional cascaded modular control pipelines often yield suboptimal performance due to conservative assumptions and tedious parameter tuning. Pure data-driven approaches promise robust performance but suffer from low sample efficiency, sim-to-real gaps, and reliance on extensive datasets. Hybrid methods combining learning-based and traditional model-based control in an end-to-end manner offer a promising alternative. This work presents a self-supervised learning framework combining learning-based inertial odometry (IO) module and differentiable model predictive control (d-MPC) for Unmanned Aerial Vehicle (UAV) attitude control. The IO denoises raw IMU measurements and predicts UAV attitudes, which are then optimized by MPC for control actions in a bi-level optimization (BLO) setup, where the inner MPC optimizes control actions and the upper level minimizes discrepancy between real-world and predicted performance. The framework is thus end-to-end and can be trained in a self-supervised manner. This approach combines the strength of learning-based perception with the interpretable model-based control. Results show the effectiveness even under strong wind. It can simultaneously enhance both the MPC parameter learning and IMU prediction performance.

Paper Structure

This paper contains 13 sections, 10 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The proposed framework. The IMU model predicts the current state $\bm{x}_k^I$. The d-MPC solves for the optimal action $\bm{u}_k$ under lower-level $L$, which controls the dynamics model to the next state ($\bm{x}_{k+1}$) and actuates the real system to next state measured by the same IMU ($\bm{x}^I_{k+1}$). The upper-level $U$ minimizes the discrepancy between $\bm{x}_{k+1}$ and $\bm{x}^I_{k+1}$.
  • Figure 2: UAV Performances. (a) The UAV attitude quickly returns to a stable hover for an initial condition of 20° using iMPC. (b) Snapshots of iMPC under 20 m/s wind disturbance in Gazebo, including takeoff, hover, being disturbed by the wind, and returning to the hover.
  • Figure 3: Control performance of iMPC and RL (PPO) under different levels of wind disturbance.