MAGIC-VFM: Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

Elena Sorina Lupu; Fengze Xie; James A. Preiss; Jedidiah Alindogan; Matthew Anderson; Soon-Jo Chung

MAGIC-VFM: Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

Elena Sorina Lupu, Fengze Xie, James A. Preiss, Jedidiah Alindogan, Matthew Anderson, Soon-Jo Chung

TL;DR

This work presents an offline meta-learning algorithm to construct a rapidly-tunable model of residual dynamics and disturbances of off-road vehicles, and provides mathematical guarantees of stability and robustness for the controller.

Abstract

Control of off-road vehicles is challenging due to the complex dynamic interactions with the terrain. Accurate modeling of these interactions is important to optimize driving performance, but the relevant physical phenomena are too complex to model from first principles. Therefore, we present an offline meta-learning algorithm to construct a rapidly-tunable model of residual dynamics and disturbances. Our model processes terrain images into features using a visual foundation model (VFM), then maps these features and the vehicle state to an estimate of the current actuation matrix using a deep neural network (DNN). We then combine this model with composite adaptive control to modify the last layer of the DNN in real time, accounting for the remaining terrain interactions not captured during offline training. We provide mathematical guarantees of stability and robustness for our controller and demonstrate the effectiveness of our method through simulations and hardware experiments with a tracked vehicle and a car-like robot. We evaluate our method outdoors on different slopes with varying slippage and actuator degradation disturbances, and compare against an adaptive controller that does not use the VFM terrain features. We show significant improvement over the baseline in both hardware experimentation and simulation.

MAGIC-VFM: Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

TL;DR

Abstract

Paper Structure (46 sections, 4 theorems, 53 equations, 24 figures, 7 tables, 2 algorithms)

This paper contains 46 sections, 4 theorems, 53 equations, 24 figures, 7 tables, 2 algorithms.

Introduction
Contributions
Paper Organization
Notation
Related Work
Embedding Visual Information in Classical Control and Reinforcement Learning
Visual Foundation Models in Robotics
Adaptation to Ground Disturbances
Methods
Residual Dynamics Representation using vfm
Offline Meta-Learning Phase
Dataset
Model Architecture
Optimization
Online $\boldsymbol{\theta}$ Adaptation and Tracking Control
...and 31 more sections

Key Result

Theorem 1

By applying the controller in eq:main_controller to the dynamics that evolve according to eq:dynamicsAB, with the composite adaptation law, for each $i \in [1, n_\theta]$ where $\gamma_i > 0,\ q_i > 0$, for each $i \in [1, n_\theta]$, then the tracking errors $\mathbf{s}$ and the parameter error $\tilde{\boldsymbol{\theta}}$ will exponentially converge to a bounded error ball.

Figures (24)

Figure 1: MAGICVFM: An offline meta-learning algorithm to build a residual dynamics and disturbance model using both Visual Foundation Models (VFM) and vehicle states. This model is integrated with composite adaptive control to adapt to changes in both the terrain and vehicle dynamics conditions in real time. See https://youtu.be/sxM73ryweRA
Figure 2: Terrain-aware Architecture: offline data collection and training (Algorithm \ref{['alg:training-v2']}), followed by real-time adaptive control (Algorithm \ref{['alg:adaptation_at_execution_time']}) running onboard the robot.
Figure 3: The structure of the DNN used for the basis function $\boldsymbol{\Phi}^\mathbf{w}$ in the controller synthesis from \ref{['eq:main_controller']}, \ref{['eq:adapt_law']} applied to a tracked vehicle.
Figure 4: The frames of reference for the tracked vehicle, its corresponding velocities, and the main driving components (left), a velocity vector diagram used for the proof of Theorem \ref{['theorem2']}(middle), and the car model notations (right). For both vehicles, we assume the center of mass and the body frame are at the same location.
Figure 5: DINO VFM discriminative ability for different terrains. We show the histograms of the projection values onto the separating hyperplane normal computed using Support Vector Classifier for 3 sets of classes with 5 images each (each row presents the separation margin between one class type and the other 2 classes). Note that the spikes at -1 and 1 are an artifact of the high dimensionality and the small dataset we used.
...and 19 more figures

Theorems & Definitions (8)

Theorem 1
proof
Proposition 1
proof
Theorem 2
proof
Theorem 3
proof

MAGIC-VFM: Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

TL;DR

Abstract

MAGIC-VFM: Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (8)