Table of Contents
Fetching ...

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

Benedikt Hornig, Reuth Mirsky

Abstract

In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize this dynamic, this paper introduces the Intelligent Disobedience Game (IDG), a sequential game-theoretic framework based on Stackelberg games that models the interaction between a human leader and an assistive follower operating under asymmetric information. It characterizes optimal strategies for both agents across multi-step scenarios, identifying strategic phenomena such as ``safety traps,'' where the system indefinitely avoids harm but fails to achieve the human's goal. The IDG provides a needed mathematical foundation that enables both the algorithmic development of agents that can learn safe non-compliance and the empirical study of how humans perceive and trust disobedient AI. The paper further translates the IDG into a shared control Multi-Agent Markov Decision Process representation, forming a compact computational testbed for training reinforcement learning agents.

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

Abstract

In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize this dynamic, this paper introduces the Intelligent Disobedience Game (IDG), a sequential game-theoretic framework based on Stackelberg games that models the interaction between a human leader and an assistive follower operating under asymmetric information. It characterizes optimal strategies for both agents across multi-step scenarios, identifying strategic phenomena such as ``safety traps,'' where the system indefinitely avoids harm but fails to achieve the human's goal. The IDG provides a needed mathematical foundation that enables both the algorithmic development of agents that can learn safe non-compliance and the empirical study of how humans perceive and trust disobedient AI. The paper further translates the IDG into a shared control Multi-Agent Markov Decision Process representation, forming a compact computational testbed for training reinforcement learning agents.
Paper Structure (12 sections, 2 equations, 2 figures)

This paper contains 12 sections, 2 equations, 2 figures.

Figures (2)

  • Figure 1: The game tree of the 1-step Intelligent Disobedience Game. Squares indicate decision nodes for the leader L and the follower F. The follower's actions obey and disobey are denoted as o and d respectively.
  • Figure 2: The Intelligent Disobedience Game as a Shared Control System.

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Definition 3