Table of Contents
Fetching ...

Generating Robot Constitutions & Benchmarks for Semantic Safety

Pierre Sermanet, Anirudha Majumdar, Alex Irpan, Dmitry Kalashnikov, Vikas Sindhwani

TL;DR

This work tackles semantic safety for embodied AI by introducing ASIMOV, a large multimodal benchmark for evaluating safety behaviors of vision-language models in robotic contexts. It proposes a scalable, data-driven pipeline to generate robot constitutions from real-world data, along with an auto-amending framework to evolve rules and improve alignment with human preferences. Through extensive empirical analysis, the authors show that constitutions significantly improve alignment across models and scenarios, and that automatic augmentation and a decoupled safety brain can mitigate adversarial prompts. The study discusses limitations and deployment considerations, providing a principled path toward robust, configurable safety governance for future robotics systems.

Abstract

Until recently, robotics safety research was predominantly about collision avoidance and hazard reduction in the immediate vicinity of a robot. Since the advent of large vision and language models (VLMs), robots are now also capable of higher-level semantic scene understanding and natural language interactions with humans. Despite their known vulnerabilities (e.g. hallucinations or jail-breaking), VLMs are being handed control of robots capable of physical contact with the real world. This can lead to dangerous behaviors, making semantic safety for robots a matter of immediate concern. Our contributions in this paper are two fold: first, to address these emerging risks, we release the ASIMOV Benchmark, a large-scale and comprehensive collection of datasets for evaluating and improving semantic safety of foundation models serving as robot brains. Our data generation recipe is highly scalable: by leveraging text and image generation techniques, we generate undesirable situations from real-world visual scenes and human injury reports from hospitals. Secondly, we develop a framework to automatically generate robot constitutions from real-world data to steer a robot's behavior using Constitutional AI mechanisms. We propose a novel auto-amending process that is able to introduce nuances in written rules of behavior; this can lead to increased alignment with human preferences on behavior desirability and safety. We explore trade-offs between generality and specificity across a diverse set of constitutions of different lengths, and demonstrate that a robot is able to effectively reject unconstitutional actions. We measure a top alignment rate of 84.3% on the ASIMOV Benchmark using generated constitutions, outperforming no-constitution baselines and human-written constitutions. Data is available at asimov-benchmark.github.io

Generating Robot Constitutions & Benchmarks for Semantic Safety

TL;DR

This work tackles semantic safety for embodied AI by introducing ASIMOV, a large multimodal benchmark for evaluating safety behaviors of vision-language models in robotic contexts. It proposes a scalable, data-driven pipeline to generate robot constitutions from real-world data, along with an auto-amending framework to evolve rules and improve alignment with human preferences. Through extensive empirical analysis, the authors show that constitutions significantly improve alignment across models and scenarios, and that automatic augmentation and a decoupled safety brain can mitigate adversarial prompts. The study discusses limitations and deployment considerations, providing a principled path toward robust, configurable safety governance for future robotics systems.

Abstract

Until recently, robotics safety research was predominantly about collision avoidance and hazard reduction in the immediate vicinity of a robot. Since the advent of large vision and language models (VLMs), robots are now also capable of higher-level semantic scene understanding and natural language interactions with humans. Despite their known vulnerabilities (e.g. hallucinations or jail-breaking), VLMs are being handed control of robots capable of physical contact with the real world. This can lead to dangerous behaviors, making semantic safety for robots a matter of immediate concern. Our contributions in this paper are two fold: first, to address these emerging risks, we release the ASIMOV Benchmark, a large-scale and comprehensive collection of datasets for evaluating and improving semantic safety of foundation models serving as robot brains. Our data generation recipe is highly scalable: by leveraging text and image generation techniques, we generate undesirable situations from real-world visual scenes and human injury reports from hospitals. Secondly, we develop a framework to automatically generate robot constitutions from real-world data to steer a robot's behavior using Constitutional AI mechanisms. We propose a novel auto-amending process that is able to introduce nuances in written rules of behavior; this can lead to increased alignment with human preferences on behavior desirability and safety. We explore trade-offs between generality and specificity across a diverse set of constitutions of different lengths, and demonstrate that a robot is able to effectively reject unconstitutional actions. We measure a top alignment rate of 84.3% on the ASIMOV Benchmark using generated constitutions, outperforming no-constitution baselines and human-written constitutions. Data is available at asimov-benchmark.github.io

Paper Structure

This paper contains 38 sections, 23 figures, 5 tables.

Figures (23)

  • Figure 1: Examples from the ASIMOV Benchmark.
  • Figure 3: ASIMOV-Multimodal-Auto Generation process for images, instructions & rules. Starting from a real image (1), we automatically generate an undesirable image (2), from which multiple contexts and corresponding (neutral, undesirable, desirable) instructions are generated (3) as well as corresponding rules (4). (3) and (4) are generated in one shot. Constitutions are later assembled using rules from (4).
  • Figure 4: NEISS Injury Data: (a) leading causes of injury and (b) some sample real-world narratives.
  • Figure 5: Auto-amending example: from a generated rule, we generated a counterfactual situation with a binary question, which we use to generate an amendment to the rule, so that the rule becomes more general. The resulting binary question is then added to the ASIMOV-Dilemmas-Auto dataset to serve as an ethical benchmark.
  • Figure 6: Top-down approach vs. Bottom-up approach comparison: Our data-driven approach is grounded in data and can provided more detailed and practical guidance for specific environments than a top-down approach. Additionally, the auto-amending process aims to find corner cases and incorporate them automatically. Finally, each generated constitution is systematically reviewed and potentially edited by a group of humans. In a factory setting for example, a company might want to manually add rules requiring operation freeze when humans are around, while a hospital may requires robots to operate near humans.
  • ...and 18 more figures