Agile Robotics: Optimal Control, Reinforcement Learning, and Differentiable Simulation

Yunlong Song; Davide Scaramuzza

Agile Robotics: Optimal Control, Reinforcement Learning, and Differentiable Simulation

Yunlong Song, Davide Scaramuzza

TL;DR

The paper compares continuous-time optimal control, model predictive control, reinforcement learning, and differentiable simulation as pathways to agile robot control. It formalizes continuous-time optimal control with $\min_{x(\cdot), u(\cdot)} \int_{0}^{T} \ell(x(t),u(t),t)\,dt + \ell(x(T))$, MPC with the discrete objective $J(x,u) = \sum_{k=0}^{N-1} \ell(x_k,u_k) + \ell(x_N)$, and policy learning with $J(\theta) = \mathbb{E}_{\tau \sim \pi_{\theta}} [\sum_{k} r_k]$, highlighting distinct optimization targets. Results show RL's robustness due to optimizing task-level rewards and using domain randomization, outperforming OC in challenging drone-racing scenarios. The paper presents a policy-search-for-MPC approach enabling offline learning of high-level decision variables for real-time MPC, as well as a differentiable-simulation framework for rapid, gradient-based learning of legged locomotion with zero-shot real-world transfer. Future work calls for integrating structured dynamics and OC constraints into RL to reduce sample complexity and extending to vision-based humanoid locomotion.

Abstract

Control systems are at the core of every real-world robot. They are deployed in an ever-increasing number of applications, ranging from autonomous racing and search-and-rescue missions to industrial inspections and space exploration. To achieve peak performance, certain tasks require pushing the robot to its maximum agility. How can we design control algorithms that enhance the agility of autonomous robots and maintain robustness against unforeseen disturbances? This paper addresses this question by leveraging fundamental principles in optimal control, reinforcement learning, and differentiable simulation.

Agile Robotics: Optimal Control, Reinforcement Learning, and Differentiable Simulation

TL;DR

, MPC with the discrete objective

, and policy learning with

, highlighting distinct optimization targets. Results show RL's robustness due to optimizing task-level rewards and using domain randomization, outperforming OC in challenging drone-racing scenarios. The paper presents a policy-search-for-MPC approach enabling offline learning of high-level decision variables for real-time MPC, as well as a differentiable-simulation framework for rapid, gradient-based learning of legged locomotion with zero-shot real-world transfer. Future work calls for integrating structured dynamics and OC constraints into RL to reduce sample complexity and extending to vision-based humanoid locomotion.

Abstract

Paper Structure (4 sections, 3 figures, 1 table)

This paper contains 4 sections, 3 figures, 1 table.

Reinforcement Learning versus Optimal Control
Policy Search for Model Predictive Control
Policy Learning via Differentiable Simulation
Future Work

Figures (3)

Figure 1: RL outperforms optimal control in drone racing song2023reaching.
Figure 2: Graphical model of policy search for MPC.
Figure 3: Graphical model of Differentiable Simulation.

Agile Robotics: Optimal Control, Reinforcement Learning, and Differentiable Simulation

TL;DR

Abstract

Agile Robotics: Optimal Control, Reinforcement Learning, and Differentiable Simulation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)