DR-PETS: Learning-Based Control With Planning in Adversarial Environments
Hozefa Jesawada, Antonio Acernese, Giovanni Russo, Carmen Del Vecchio
TL;DR
This work addresses robustness of model-based RL to epistemic and adversarial perturbations by introducing DR-PETS, a distributionally robust extension of PETS that uses a $p$-Wasserstein ambiguity set to anticipate worst-case dynamics during MPC planning. A Wasserstein duality-based reformulation yields a tractable, regularized MPC objective that combines the empirical ensemble performance with a gradient-based robustness term, preserving PETS’ data efficiency. Theoretical results provide a closed-form expression for the robust objective, and practical simplifications arise for $p=2$, enabling efficient planning with particle-based state propagation and CEM-based action selection. Empirical evaluations on pendulum and cart-pole tasks show DR-PETS offers improved worst-case performance under adversarial perturbations while maintaining near-parity with PETS in nominal settings, with open-source implementation for reproducibility.
Abstract
Ensuring robustness against epistemic, possibly adversarial, perturbations is essential for reliable real-world decision-making. While the Probabilistic Ensembles with Trajectory Sampling (PETS) algorithm inherently handles uncertainty via ensemble-based probabilistic models, it lacks guarantees against structured adversarial or worst-case uncertainty distributions. To address this, we propose DR-PETS, a distributionally robust extension of PETS that certifies robustness against adversarial perturbations. We formalize uncertainty via a p-Wasserstein ambiguity set, enabling worst-case-aware planning through a min-max optimization framework. While PETS passively accounts for stochasticity, DR-PETS actively optimizes robustness via a tractable convex approximation integrated into PETS planning loop. Experiments on pendulum stabilization and cart-pole balancing show that DR-PETS certifies robustness against adversarial parameter perturbations, achieving consistent performance in worst-case scenarios where PETS deteriorates.
