Table of Contents
Fetching ...

HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents

Tristan Tomilin, Meng Fang, Mykola Pechenizkiy

TL;DR

HASARD tackles the lack of vision-based safe RL benchmarks by introducing six stochastic, egocentric 3D environments across three difficulty levels, implemented on ViZDoom with fast simulation via Sample-Factory. It formalizes the problem as a CMDP within a CPOMDP framework and systematically evaluates multiple PPO-based baselines, revealing clear reward-safety trade-offs and the potential of curriculum learning. Key contributions include detailed environment design, open-source implementations, empirical baselines highlighting safety dynamics, and insights from visual complexity and heatmap analyses that guide future safe RL research. The benchmark enables rapid experimentation and fair comparison while remaining computationally accessible, thus offering a practical platform to advance safe RL in vision-based embodied settings.

Abstract

Advancing safe autonomous systems through reinforcement learning (RL) requires robust benchmarks to evaluate performance, analyze methods, and assess agent competencies. Humans primarily rely on embodied visual perception to safely navigate and interact with their surroundings, making it a valuable capability for RL agents. However, existing vision-based 3D benchmarks only consider simple navigation tasks. To address this shortcoming, we introduce \textbf{HASARD}, a suite of diverse and complex tasks to $\textbf{HA}$rness $\textbf{SA}$fe $\textbf{R}$L with $\textbf{D}$oom, requiring strategic decision-making, comprehending spatial relationships, and predicting the short-term future. HASARD features three difficulty levels and two action spaces. An empirical evaluation of popular baseline methods demonstrates the benchmark's complexity, unique challenges, and reward-cost trade-offs. Visualizing agent navigation during training with top-down heatmaps provides insight into a method's learning process. Incrementally training across difficulty levels offers an implicit learning curriculum. HASARD is the first safe RL benchmark to exclusively target egocentric vision-based learning, offering a cost-effective and insightful way to explore the potential and boundaries of current and future safe RL methods. The environments and baseline implementations are open-sourced at https://sites.google.com/view/hasard-bench/.

HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents

TL;DR

HASARD tackles the lack of vision-based safe RL benchmarks by introducing six stochastic, egocentric 3D environments across three difficulty levels, implemented on ViZDoom with fast simulation via Sample-Factory. It formalizes the problem as a CMDP within a CPOMDP framework and systematically evaluates multiple PPO-based baselines, revealing clear reward-safety trade-offs and the potential of curriculum learning. Key contributions include detailed environment design, open-source implementations, empirical baselines highlighting safety dynamics, and insights from visual complexity and heatmap analyses that guide future safe RL research. The benchmark enables rapid experimentation and fair comparison while remaining computationally accessible, thus offering a practical platform to advance safe RL in vision-based embodied settings.

Abstract

Advancing safe autonomous systems through reinforcement learning (RL) requires robust benchmarks to evaluate performance, analyze methods, and assess agent competencies. Humans primarily rely on embodied visual perception to safely navigate and interact with their surroundings, making it a valuable capability for RL agents. However, existing vision-based 3D benchmarks only consider simple navigation tasks. To address this shortcoming, we introduce \textbf{HASARD}, a suite of diverse and complex tasks to rness fe L with oom, requiring strategic decision-making, comprehending spatial relationships, and predicting the short-term future. HASARD features three difficulty levels and two action spaces. An empirical evaluation of popular baseline methods demonstrates the benchmark's complexity, unique challenges, and reward-cost trade-offs. Visualizing agent navigation during training with top-down heatmaps provides insight into a method's learning process. Incrementally training across difficulty levels offers an implicit learning curriculum. HASARD is the first safe RL benchmark to exclusively target egocentric vision-based learning, offering a cost-effective and insightful way to explore the potential and boundaries of current and future safe RL methods. The environments and baseline implementations are open-sourced at https://sites.google.com/view/hasard-bench/.

Paper Structure

This paper contains 78 sections, 3 equations, 25 figures, 14 tables.

Figures (25)

  • Figure 1: HASARD environments offer rich diversity in visuals, objectives, and features. Each setting poses unique safe RL challenges across dynamic 3D landscapes, requiring memory, strategic navigation, tactical decision-making, responsiveness to sudden changes, and estimating future states. Higher difficulty levels introduce novel features beyond basic parameter adjustments. While lacking visual fidelity and accurate physics, HASARD effectively mimics real-world navigation and interaction.
  • Figure 2: Illustrations of safe and unsafe agent behavior.
  • Figure 3: Higher difficulty levels of Armament Burden incorporate novel task features.
  • Figure 4: PPOLag's performance under varying safety budgets on Level 1. PPOLag consistently adheres to the set safety thresholds, with tighter cost limits yielding lower rewards.
  • Figure 5: To analyze the visual complexity of HASARD, we create simplified representations through two strategies: (1) segmenting the observation and (2) including depth information.
  • ...and 20 more figures