Table of Contents
Fetching ...

Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy

Bram Grooten, Patrick MacAlpine, Kaushik Subramanian, Peter Stone, Peter R. Wurman

TL;DR

This work introduces SPARC: single-phase adaptation for robust control, and tests it on varying contexts within the high-fidelity racing simulator Gran Turismo 7 and wind-perturbed MuJoCo environments, finding that it achieves reliable and robust OOD generalization.

Abstract

Generalization to unseen environments is a significant challenge in the field of robotics and control. In this work, we focus on contextual reinforcement learning, where agents act within environments with varying contexts, such as self-driving cars or quadrupedal robots that need to operate in different terrains or weather conditions than they were trained for. We tackle the critical task of generalizing to out-of-distribution (OOD) settings, without access to explicit context information at test time. Recent work has addressed this problem by training a context encoder and a history adaptation module in separate stages. While promising, this two-phase approach is cumbersome to implement and train. We simplify the methodology and introduce SPARC: single-phase adaptation for robust control. We test SPARC on varying contexts within the high-fidelity racing simulator Gran Turismo 7 and wind-perturbed MuJoCo environments, and find that it achieves reliable and robust OOD generalization.

Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy

TL;DR

This work introduces SPARC: single-phase adaptation for robust control, and tests it on varying contexts within the high-fidelity racing simulator Gran Turismo 7 and wind-perturbed MuJoCo environments, finding that it achieves reliable and robust OOD generalization.

Abstract

Generalization to unseen environments is a significant challenge in the field of robotics and control. In this work, we focus on contextual reinforcement learning, where agents act within environments with varying contexts, such as self-driving cars or quadrupedal robots that need to operate in different terrains or weather conditions than they were trained for. We tackle the critical task of generalizing to out-of-distribution (OOD) settings, without access to explicit context information at test time. Recent work has addressed this problem by training a context encoder and a history adaptation module in separate stages. While promising, this two-phase approach is cumbersome to implement and train. We simplify the methodology and introduce SPARC: single-phase adaptation for robust control. We test SPARC on varying contexts within the high-fidelity racing simulator Gran Turismo 7 and wind-perturbed MuJoCo environments, and find that it achieves reliable and robust OOD generalization.

Paper Structure

This paper contains 38 sections, 5 equations, 17 figures, 12 tables, 1 algorithm.

Figures (17)

  • Figure 1: Overview of our algorithm SPARC (top) and the problem setting in Gran Turismo 7 (bottom). SPARC trains an expert policy $\pi^{ex}$ and an adapter policy $\pi^{ad}$ simultaneously in a single phase. The adapter policy does not require access to privileged contextual information, facilitating deployment to OOD real-world scenarios. Observations $o$, contextual information $c$, and a history of recent observation-action pairs $h$ are passed into the networks. Latent encodings $\ell$ and $z$ are concatenated and passed to the final layers, producing action $a$. Similar to RMA kumar2021rma, $\pi^{ex}$ is trained with reinforcement learning, while the History Adapter $\phi$ of $\pi^{ad}$ is trained with supervised learning to regress its encoding $\phi(h)=\hat{z}$ to the Context Encoder's output $\psi(c)=z$. Note that since SPARC trains in one phase, the context encoding $z$ is a moving target, instead of a traditionally fixed target in RMA. Trainable modules are in green. The black modules regularly copy weights from their counterpart in $\pi^{ex}$.
  • Figure 2: Results on Grand Valley averaged over three seeds. For each algorithm, we plot the percentage of cars that successfully completed laps, and the built-in AI ratio lap time. SPARC is able to complete the most and the fastest laps on out-of-distribution cars.
  • Figure 3: Difference in average return of SPARC versus RMA with varying wind perturbations over 5 seeds. In green: SPARC is better in that wind setting, while in purple: RMA scores higher. Our method outperforms the two-phase baseline across many IND and OOD contextual settings.
  • Figure 4: (a) and (b): Lap times on the Power & Mass experiment. Colours denote average lap time over 3 seeds (red = fast, blue = slow); black squares indicate at least one unfinished lap. Even though both algorithms are trained only on settings within the IND region (dashed box), SPARC is able to handle challenging OOD settings in the bottom-right corner (high power and low mass). (c): Performance difference between old and new game dynamics. These algorithms have only been trained on old physics settings, and are tested zero-shot on the new physics after a game update of Gran Turismo. SPARC shows the best OOD generalization, with only slightly slower lap times on new dynamics, while other methods degrade significantly.
  • Figure 5: Illustrative in‑distribution (IND) versus out‑of‑distribution (OOD) vehicles used in our Gran Turismo 7 experiments.
  • ...and 12 more figures