Safe and Near-Optimal Control with Online Dynamics Learning

Manish Prajapat; Johannes Köhler; Melanie N. Zeilinger; Andreas Krause

Safe and Near-Optimal Control with Online Dynamics Learning

Manish Prajapat, Johannes Köhler, Melanie N. Zeilinger, Andreas Krause

TL;DR

The notion of maximum safe dynamics learning, where sufficient exploration is performed within the space of safe policies is introduced, where sufficient exploration is performed within the space of safe policies, ensures continuous online learning of dynamics.

Abstract

Achieving both optimality and safety under unknown system dynamics is a central challenge in real-world deployment of agents. To address this, we introduce a notion of maximum safe dynamics learning, where sufficient exploration is performed within the space of safe policies. Our method executes $\textit{pessimistically}$ safe policies while $\textit{optimistically}$ exploring informative states and, despite not reaching them due to model uncertainty, ensures continuous online learning of dynamics. The framework achieves first-of-its-kind results: learning the dynamics model sufficiently $-$ up to an arbitrary small tolerance (subject to noise) $-$ in a finite time, while ensuring provably safe operation throughout with high probability and without requiring resets. Building on this, we propose an algorithm to maximize rewards while learning the dynamics $\textit{only to the extent needed}$ to achieve close-to-optimal performance. Unlike typical reinforcement learning (RL) methods, our approach operates online in a non-episodic setting and ensures safety throughout the learning process. We demonstrate the effectiveness of our approach in challenging domains such as autonomous car racing and drone navigation under aerodynamic effects $-$ scenarios where safety is critical and accurate modeling is difficult.

Safe and Near-Optimal Control with Online Dynamics Learning

TL;DR

Abstract

safe policies while

exploring informative states and, despite not reaching them due to model uncertainty, ensures continuous online learning of dynamics. The framework achieves first-of-its-kind results: learning the dynamics model sufficiently

up to an arbitrary small tolerance (subject to noise)

in a finite time, while ensuring provably safe operation throughout with high probability and without requiring resets. Building on this, we propose an algorithm to maximize rewards while learning the dynamics

to achieve close-to-optimal performance. Unlike typical reinforcement learning (RL) methods, our approach operates online in a non-episodic setting and ensures safety throughout the learning process. We demonstrate the effectiveness of our approach in challenging domains such as autonomous car racing and drone navigation under aerodynamic effects

scenarios where safety is critical and accurate modeling is difficult.

Safe and Near-Optimal Control with Online Dynamics Learning

TL;DR

Abstract

Safe and Near-Optimal Control with Online Dynamics Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (27)