Table of Contents
Fetching ...

DR-PETS: Learning-Based Control With Planning in Adversarial Environments

Hozefa Jesawada, Antonio Acernese, Giovanni Russo, Carmen Del Vecchio

TL;DR

This work addresses robustness of model-based RL to epistemic and adversarial perturbations by introducing DR-PETS, a distributionally robust extension of PETS that uses a $p$-Wasserstein ambiguity set to anticipate worst-case dynamics during MPC planning. A Wasserstein duality-based reformulation yields a tractable, regularized MPC objective that combines the empirical ensemble performance with a gradient-based robustness term, preserving PETS’ data efficiency. Theoretical results provide a closed-form expression for the robust objective, and practical simplifications arise for $p=2$, enabling efficient planning with particle-based state propagation and CEM-based action selection. Empirical evaluations on pendulum and cart-pole tasks show DR-PETS offers improved worst-case performance under adversarial perturbations while maintaining near-parity with PETS in nominal settings, with open-source implementation for reproducibility.

Abstract

Ensuring robustness against epistemic, possibly adversarial, perturbations is essential for reliable real-world decision-making. While the Probabilistic Ensembles with Trajectory Sampling (PETS) algorithm inherently handles uncertainty via ensemble-based probabilistic models, it lacks guarantees against structured adversarial or worst-case uncertainty distributions. To address this, we propose DR-PETS, a distributionally robust extension of PETS that certifies robustness against adversarial perturbations. We formalize uncertainty via a p-Wasserstein ambiguity set, enabling worst-case-aware planning through a min-max optimization framework. While PETS passively accounts for stochasticity, DR-PETS actively optimizes robustness via a tractable convex approximation integrated into PETS planning loop. Experiments on pendulum stabilization and cart-pole balancing show that DR-PETS certifies robustness against adversarial parameter perturbations, achieving consistent performance in worst-case scenarios where PETS deteriorates.

DR-PETS: Learning-Based Control With Planning in Adversarial Environments

TL;DR

This work addresses robustness of model-based RL to epistemic and adversarial perturbations by introducing DR-PETS, a distributionally robust extension of PETS that uses a -Wasserstein ambiguity set to anticipate worst-case dynamics during MPC planning. A Wasserstein duality-based reformulation yields a tractable, regularized MPC objective that combines the empirical ensemble performance with a gradient-based robustness term, preserving PETS’ data efficiency. Theoretical results provide a closed-form expression for the robust objective, and practical simplifications arise for , enabling efficient planning with particle-based state propagation and CEM-based action selection. Empirical evaluations on pendulum and cart-pole tasks show DR-PETS offers improved worst-case performance under adversarial perturbations while maintaining near-parity with PETS in nominal settings, with open-source implementation for reproducibility.

Abstract

Ensuring robustness against epistemic, possibly adversarial, perturbations is essential for reliable real-world decision-making. While the Probabilistic Ensembles with Trajectory Sampling (PETS) algorithm inherently handles uncertainty via ensemble-based probabilistic models, it lacks guarantees against structured adversarial or worst-case uncertainty distributions. To address this, we propose DR-PETS, a distributionally robust extension of PETS that certifies robustness against adversarial perturbations. We formalize uncertainty via a p-Wasserstein ambiguity set, enabling worst-case-aware planning through a min-max optimization framework. While PETS passively accounts for stochasticity, DR-PETS actively optimizes robustness via a tractable convex approximation integrated into PETS planning loop. Experiments on pendulum stabilization and cart-pole balancing show that DR-PETS certifies robustness against adversarial parameter perturbations, achieving consistent performance in worst-case scenarios where PETS deteriorates.

Paper Structure

This paper contains 10 sections, 2 theorems, 29 equations, 2 figures.

Key Result

Theorem III.1

Let Assumption assm:linearity hold. Then:

Figures (2)

  • Figure 1: Total episodic reward obtained by the PETS (in blue) and DR-PETS (in red) for perturbation of pendulum mass. Shaded region denotes half of one standard error.
  • Figure 2: Total episodic reward obtained by the PETS (in blue) and DR-PETS (in red) for perturbation of pole length. Shaded region denotes half of one standard error.

Theorems & Definitions (6)

  • Definition 1
  • Remark 1
  • Theorem III.1
  • proof
  • Theorem III.2
  • proof