DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

Se-Wook Yoo; Seung-Woo Seo

DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

Se-Wook Yoo, Seung-Woo Seo

TL;DR

DIAL tackles safety in reinforcement learning by learning a distribution over shared constraints from multi-task demonstrations and adjusting risk levels to safely adapt to new tasks. It combines a Bayesian, risk-aware constraint model with CVaR-based distortion and a task-agnostic safe exploration policy, followed by safe transfer learning to solve new tasks without predefined task-specific constraints. The approach uses a Beta-distributed constraint posterior, CVaR for tail risk, and a structured, entropy-enhanced exploration scheme within a Lagrangian framework to balance reward, safety, and exploration. Experiments in urban driving and robot-control benchmarks show improved safety outcomes and transfer efficiency over baselines, highlighting DIAL’s potential for scalable safety in real-world, safety-critical systems.

Abstract

Safe reinforcement learning has traditionally relied on predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. However, defining these functions accurately for varied tasks is a persistent challenge. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. Building on this insight, we propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions. This adaptability addresses variations in risk sensitivity stemming from expert-specific biases, ensuring consistent adherence to general safety principles even with imperfect demonstrations. Our method can be applied to control and navigation domains, including multi-task and meta-task scenarios, accommodating constraints such as maintaining safe distances or adhering to speed limits. Experimental results validate the efficacy of our approach, demonstrating superior safety performance and success rates compared to baselines, all without requiring task-specific constraint definitions. These findings underscore the versatility and practicality of our method across a wide range of real-world tasks.

DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

TL;DR

Abstract

DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)