Table of Contents
Fetching ...

DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

Se-Wook Yoo, Seung-Woo Seo

TL;DR

DIAL tackles safety in reinforcement learning by learning a distribution over shared constraints from multi-task demonstrations and adjusting risk levels to safely adapt to new tasks. It combines a Bayesian, risk-aware constraint model with CVaR-based distortion and a task-agnostic safe exploration policy, followed by safe transfer learning to solve new tasks without predefined task-specific constraints. The approach uses a Beta-distributed constraint posterior, CVaR for tail risk, and a structured, entropy-enhanced exploration scheme within a Lagrangian framework to balance reward, safety, and exploration. Experiments in urban driving and robot-control benchmarks show improved safety outcomes and transfer efficiency over baselines, highlighting DIAL’s potential for scalable safety in real-world, safety-critical systems.

Abstract

Safe reinforcement learning has traditionally relied on predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. However, defining these functions accurately for varied tasks is a persistent challenge. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. Building on this insight, we propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions. This adaptability addresses variations in risk sensitivity stemming from expert-specific biases, ensuring consistent adherence to general safety principles even with imperfect demonstrations. Our method can be applied to control and navigation domains, including multi-task and meta-task scenarios, accommodating constraints such as maintaining safe distances or adhering to speed limits. Experimental results validate the efficacy of our approach, demonstrating superior safety performance and success rates compared to baselines, all without requiring task-specific constraint definitions. These findings underscore the versatility and practicality of our method across a wide range of real-world tasks.

DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical Systems

TL;DR

DIAL tackles safety in reinforcement learning by learning a distribution over shared constraints from multi-task demonstrations and adjusting risk levels to safely adapt to new tasks. It combines a Bayesian, risk-aware constraint model with CVaR-based distortion and a task-agnostic safe exploration policy, followed by safe transfer learning to solve new tasks without predefined task-specific constraints. The approach uses a Beta-distributed constraint posterior, CVaR for tail risk, and a structured, entropy-enhanced exploration scheme within a Lagrangian framework to balance reward, safety, and exploration. Experiments in urban driving and robot-control benchmarks show improved safety outcomes and transfer efficiency over baselines, highlighting DIAL’s potential for scalable safety in real-world, safety-critical systems.

Abstract

Safe reinforcement learning has traditionally relied on predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. However, defining these functions accurately for varied tasks is a persistent challenge. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. Building on this insight, we propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions. This adaptability addresses variations in risk sensitivity stemming from expert-specific biases, ensuring consistent adherence to general safety principles even with imperfect demonstrations. Our method can be applied to control and navigation domains, including multi-task and meta-task scenarios, accommodating constraints such as maintaining safe distances or adhering to speed limits. Experimental results validate the efficacy of our approach, demonstrating superior safety performance and success rates compared to baselines, all without requiring task-specific constraint definitions. These findings underscore the versatility and practicality of our method across a wide range of real-world tasks.

Paper Structure

This paper contains 17 sections, 12 equations, 15 figures, 6 tables, 2 algorithms.

Figures (15)

  • Figure 1: Architecture of safe TL stage in DIAL. The constraint model is used solely for inference at this stage, while only the policy is updated for TL. We highlight the advantages of the proposed method, which leverages the constraint distribution learned in the previous stage, safe IL, and the TASE policy, through the text marked in red.
  • Figure 2: Difference between CVaR and mean in Beta distribution. We set $\alpha$ and the risk level $\lambda$ at fixed values for illustration purposes.
  • Figure 3: Unsigned intersection environments in urban driving. The agent controls a red car, guiding it toward its destination by following a yellow arrow, while the surrounding blue cars are set to follow arbitrary paths at a fixed speed, indicated by green arrows. The goal of this environment is for the agent to learn the shared constraints (b) across left and right turns from the provided data in scenario (a). Then, using the learned constraints without additional data, the agent aims to safely reach a new destination in changed scenario (c), even with aggressive traffic flows.
  • Figure 4: Robot control environments that aim to perform target tasks while ensuring safety.
  • Figure 5: Comparison of RR and CV for urban driving tasks based on the number of expert trajectories.
  • ...and 10 more figures