Table of Contents
Fetching ...

Don't blame me: How Intelligent Support Affects Moral Responsibility in Human Oversight

Cedric Faas, Richard Uth, Sarah Sterz, Markus Langer, Anna Maria Feit

TL;DR

The paper addresses whether intelligent decision support that restricts human action in safety-critical oversight roles affects perceived moral responsibility. It uses a between-subjects simulated drone-oversight task where the number of selectable actions is varied (6,4,2,1) and AI recommendations are perfectly accurate, measuring responsibility at the oversight, system, and developer levels, as well as causality and knowledge, plus decision performance. Findings show that restricting choices to a single action lowers perceived responsibility for the overseer and the system, while not affecting the developer; though it improves decision accuracy and speed, there are fewer differences among mid-range conditions. The work highlights a design trade-off: preserving meaningful human choice can maintain epistemic and causal conditions necessary for responsibility, while over-restriction risks devaluing accountability, with implications for safety-focused interfaces and regulatory expectations like the EU AI Act.

Abstract

AI-based systems can increasingly perform work tasks autonomously. In safety-critical tasks, human oversight of these systems is required to mitigate risks and to ensure responsibility in case something goes wrong. Since people often struggle to stay focused and perform good oversight, intelligent support systems are used to assist them, giving decision recommendations, alerting users, or restricting them from dangerous actions. However, in cases where recommendations are wrong, decision support might undermine the very reason why human oversight was employed -- genuine moral responsibility. The goal of our study was to investigate how a decision support system that restricted available interventions would affect overseer's perceived moral responsibility, in particular in cases where the support errs. In a simulated oversight experiment, participants (\textit{N}=274) monitored an autonomous drone that faced ten critical situations, choosing from six possible actions to resolve each situation. An AI system constrained participants' choices to either six, four, two, or only one option (between-subject study). Results showed that participants, who were restricted to choosing from a single action, felt less morally responsible if a crash occurred. At the same time, participants' judgments about the responsibility of other stakeholders (the AI; the developer of the AI) did not change between conditions. Our findings provide important insights for user interface design and oversight architectures: they should prevent users from attributing moral agency to AI, help them understand how moral responsibility is distributed, and, when oversight aims to prevent ethically undesirable outcomes, be designed to support the epistemic and causal conditions required for moral responsibility.

Don't blame me: How Intelligent Support Affects Moral Responsibility in Human Oversight

TL;DR

The paper addresses whether intelligent decision support that restricts human action in safety-critical oversight roles affects perceived moral responsibility. It uses a between-subjects simulated drone-oversight task where the number of selectable actions is varied (6,4,2,1) and AI recommendations are perfectly accurate, measuring responsibility at the oversight, system, and developer levels, as well as causality and knowledge, plus decision performance. Findings show that restricting choices to a single action lowers perceived responsibility for the overseer and the system, while not affecting the developer; though it improves decision accuracy and speed, there are fewer differences among mid-range conditions. The work highlights a design trade-off: preserving meaningful human choice can maintain epistemic and causal conditions necessary for responsibility, while over-restriction risks devaluing accountability, with implications for safety-focused interfaces and regulatory expectations like the EU AI Act.

Abstract

AI-based systems can increasingly perform work tasks autonomously. In safety-critical tasks, human oversight of these systems is required to mitigate risks and to ensure responsibility in case something goes wrong. Since people often struggle to stay focused and perform good oversight, intelligent support systems are used to assist them, giving decision recommendations, alerting users, or restricting them from dangerous actions. However, in cases where recommendations are wrong, decision support might undermine the very reason why human oversight was employed -- genuine moral responsibility. The goal of our study was to investigate how a decision support system that restricted available interventions would affect overseer's perceived moral responsibility, in particular in cases where the support errs. In a simulated oversight experiment, participants (\textit{N}=274) monitored an autonomous drone that faced ten critical situations, choosing from six possible actions to resolve each situation. An AI system constrained participants' choices to either six, four, two, or only one option (between-subject study). Results showed that participants, who were restricted to choosing from a single action, felt less morally responsible if a crash occurred. At the same time, participants' judgments about the responsibility of other stakeholders (the AI; the developer of the AI) did not change between conditions. Our findings provide important insights for user interface design and oversight architectures: they should prevent users from attributing moral agency to AI, help them understand how moral responsibility is distributed, and, when oversight aims to prevent ethically undesirable outcomes, be designed to support the epistemic and causal conditions required for moral responsibility.
Paper Structure (11 sections, 4 figures, 1 table)

This paper contains 11 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Screenshots of the drone monitoring interface. Left: Screenshot of the demo task, where participants saw a video recorded by the drone and information about the current status. After thirty seconds, the drone entered a critical situation, indicated by an auditory signal and highlighted icons for critical values. Middle: Legend of all icons used in the interface and introduced during the training phase; it was shown to the participants during the entire video. Right: After ten seconds, participants saw six possible actions, and depending on the experimental condition, some of them were grayed out by an AI decision support indicating they were not available (here: TwoSelectable Actions condition).
  • Figure 2: Differences in participants' own responsibility, causality, and knowledge judgments on a 7-point Likert Scale between the experimental conditions
  • Figure 3: Differences in responsibility judgments on a 7-point Likert Scale between the experimental conditions
  • Figure 4: Differences in decision accuracy in percentage and decision time in milliseconds between the experimental conditions