Table of Contents
Fetching ...

OCCAM: Online Continuous Controller Adaptation with Meta-Learned Models

Hersh Sanghvi, Spencer Folk, Camillo Jose Taylor

TL;DR

This work combines meta-learning with Bayesian recursive estimation to learn prior predictive models of system performance that quickly adapt to online data, even when there is significant domain shift.

Abstract

Control tuning and adaptation present a significant challenge to the usage of robots in diverse environments. It is often nontrivial to find a single set of control parameters by hand that work well across the broad array of environments and conditions that a robot might encounter. Automated adaptation approaches must utilize prior knowledge about the system while adapting to significant domain shifts to find new control parameters quickly. In this work, we present a general framework for online controller adaptation that deals with these challenges. We combine meta-learning with Bayesian recursive estimation to learn prior predictive models of system performance that quickly adapt to online data, even when there is significant domain shift. These predictive models can be used as cost functions within efficient sampling-based optimization routines to find new control parameters online that maximize system performance. Our framework is powerful and flexible enough to adapt controllers for four diverse systems: a simulated race car, a simulated quadrupedal robot, and a simulated and physical quadrotor. The video and code can be found at https://hersh500.github.io/occam.

OCCAM: Online Continuous Controller Adaptation with Meta-Learned Models

TL;DR

This work combines meta-learning with Bayesian recursive estimation to learn prior predictive models of system performance that quickly adapt to online data, even when there is significant domain shift.

Abstract

Control tuning and adaptation present a significant challenge to the usage of robots in diverse environments. It is often nontrivial to find a single set of control parameters by hand that work well across the broad array of environments and conditions that a robot might encounter. Automated adaptation approaches must utilize prior knowledge about the system while adapting to significant domain shifts to find new control parameters quickly. In this work, we present a general framework for online controller adaptation that deals with these challenges. We combine meta-learning with Bayesian recursive estimation to learn prior predictive models of system performance that quickly adapt to online data, even when there is significant domain shift. These predictive models can be used as cost functions within efficient sampling-based optimization routines to find new control parameters online that maximize system performance. Our framework is powerful and flexible enough to adapt controllers for four diverse systems: a simulated race car, a simulated quadrupedal robot, and a simulated and physical quadrotor. The video and code can be found at https://hersh500.github.io/occam.

Paper Structure

This paper contains 15 sections, 3 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: We demonstrate a flexible method for controller optimization based on online adaptation of meta-learned models on diverse robots.
  • Figure 2: An overview of our method and predictive model $\hat{f}$. Given previous sensor measurements and inputs from the system, the Optimization phase uses the prediction model to search for the best gain to try next. This gain is sent to the real system for the Evaluation phase, during which sensor measurements and performance measures are collected. The Adaptation phase computes new weights to update the prediction model with the information gathered during the evaluation. Arrows in red indicate predicted and estimated quantities, while arrows in black indicate signals from the actual system.
  • Figure 3: Time vs. Reward curves on all test systems. Our method shows robust prior performance and adaptation across all tests. For the robotic systems, normalized reward is computed by standardizing rewards according to the training dataset statistics for that system and subsequently subtracting the nominal controller's reward.
  • Figure 4: Positional tracking error results on physical Crazyflie quadrotor following a 3-dimensional ellipsoidal reference trajectory.
  • Figure : Meta-Learning with Kalman Filter Base Learner