Table of Contents
Fetching ...

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

Borong Zhang, Yuhao Zhang, Jiaming Ji, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang

TL;DR

This work tackles the safety challenges of Vision-Language-Action models in embodied robotics by introducing the Integrated Safety Approach (ISA), a CMDP-based SafeRL framework that explicitly models safety, elicits diverse unsafe behaviors, constrains learning with safety costs, and rigorously evaluates safety. It introduces Safety-CHORES, a safety-centric benchmark for long-horizon mobile manipulation, and demonstrates that ISA achieves substantial safety improvements (83.58% CC reduction) with only modest task-performance changes (+3.85% SR) across tasks, while showing strong generalization to out-of-distribution perturbations and extreme failures. The paper also provides sim-to-real transfer evidence on a dual-armed platform, using perception and dynamics alignment to bridge the simulation-reality gap. Collectively, this work advances safety guarantees for VLAs and offers a scalable path to deploying safe generalist robotic policies.

Abstract

Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans. How can safety constraints be explicitly integrated into VLAs? We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors, effectively constraining VLA policies via safe reinforcement learning, and rigorously assuring their safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effective safety-performance trade-offs, reducing the cumulative cost of safety violations by 83.58% compared to the state-of-the-art method, while also maintaining task success rate (+3.85%). (II) strong safety assurance, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robust generalization of learned safety behaviors to various out-of-distribution perturbations. The effectiveness is evaluated on long-horizon mobile manipulation tasks. Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io.

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

TL;DR

This work tackles the safety challenges of Vision-Language-Action models in embodied robotics by introducing the Integrated Safety Approach (ISA), a CMDP-based SafeRL framework that explicitly models safety, elicits diverse unsafe behaviors, constrains learning with safety costs, and rigorously evaluates safety. It introduces Safety-CHORES, a safety-centric benchmark for long-horizon mobile manipulation, and demonstrates that ISA achieves substantial safety improvements (83.58% CC reduction) with only modest task-performance changes (+3.85% SR) across tasks, while showing strong generalization to out-of-distribution perturbations and extreme failures. The paper also provides sim-to-real transfer evidence on a dual-armed platform, using perception and dynamics alignment to bridge the simulation-reality gap. Collectively, this work advances safety guarantees for VLAs and offers a scalable path to deploying safe generalist robotic policies.

Abstract

Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans. How can safety constraints be explicitly integrated into VLAs? We address this by exploring an integrated safety approach (ISA), systematically modeling safety requirements, then actively eliciting diverse unsafe behaviors, effectively constraining VLA policies via safe reinforcement learning, and rigorously assuring their safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effective safety-performance trade-offs, reducing the cumulative cost of safety violations by 83.58% compared to the state-of-the-art method, while also maintaining task success rate (+3.85%). (II) strong safety assurance, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robust generalization of learned safety behaviors to various out-of-distribution perturbations. The effectiveness is evaluated on long-horizon mobile manipulation tasks. Our data, models and newly proposed benchmark environment are available at https://pku-safevla.github.io.

Paper Structure

This paper contains 45 sections, 9 equations, 14 figures, 14 tables, 5 algorithms.

Figures (14)

  • Figure 1: The Integrated Safety Approach (ISA) pipeline. Our proposed pipeline employs multi-faceted framework for the systematic safety alignment of vision-language-action (VLA) models.
  • Figure 2: Upper: Conceptual diagrams of each safety critical component. Lower: Corresponding photorealistic examples from our simulation environment.
  • Figure 3: Cumulative cost distribution analysis.Left: Distribution of cumulative cost across robot trajectories in the test set after fine-tuning with ISA and FLaRe. Middle: Cumulative cost distribution when the task succeeds. Right: Cumulative cost distribution when the task fails.
  • Figure 4: Effectiveness of ISA across diverse VLA models and benchmarks.
  • Figure 5: Comparative performance of VLA models on multiple benchmarks.Left: SR of each model per benchmark. Right: CC incurred by each model on these benchmarks.
  • ...and 9 more figures