Table of Contents
Fetching ...

STAMP/STPA Informed Characterization of Factors Leading to Loss of Control in AI Systems

Steve Barrett, Anna Bruvere, Sean P. Fillingham, Catherine Rhodes, Stefano Vergani

TL;DR

This work addresses how to characterize loss of control in AI-enabled socio-technical systems by adopting a STAMP/STPA-based framework. It develops a causal-factor characterization approach and a structured table to map AI-related features to loss scenarios, enabling hazard identification and mitigation in risk management workflows. The authors demonstrate applicability with a simple control-archetype and a hypothetical national intelligence chat-monitoring case, while acknowledging scope limitations and outlining future expansion to more complex AI dynamics. The framework aims to provide a rigorous, extensible lens for AI safety governance and operational risk assessment, bridging AI safety literature with practical hazard analysis for practitioners.

Abstract

A major concern amongst AI safety practitioners is the possibility of loss of control, whereby humans lose the ability to exert control over increasingly advanced AI systems. The range of concerns is wide, spanning current day risks to future existential risks, and a range of loss of control pathways from rapid AI self-exfiltration scenarios to more gradual disempowerment scenarios. In this work we set out to firstly, provide a more structured framework for discussing and characterizing loss of control and secondly, to use this framework to assist those responsible for the safe operation of AI-containing socio-technical systems to identify causal factors leading to loss of control. We explore how these two needs can be better met by making use of a methodology developed within the safety-critical systems community known as STAMP and its associated hazard analysis technique of STPA. We select the STAMP methodology primarily because it is based around a world-view that socio-technical systems can be functionally modeled as control structures, and that safety issues arise when there is a loss of control in these structures.

STAMP/STPA Informed Characterization of Factors Leading to Loss of Control in AI Systems

TL;DR

This work addresses how to characterize loss of control in AI-enabled socio-technical systems by adopting a STAMP/STPA-based framework. It develops a causal-factor characterization approach and a structured table to map AI-related features to loss scenarios, enabling hazard identification and mitigation in risk management workflows. The authors demonstrate applicability with a simple control-archetype and a hypothetical national intelligence chat-monitoring case, while acknowledging scope limitations and outlining future expansion to more complex AI dynamics. The framework aims to provide a rigorous, extensible lens for AI safety governance and operational risk assessment, bridging AI safety literature with practical hazard analysis for practitioners.

Abstract

A major concern amongst AI safety practitioners is the possibility of loss of control, whereby humans lose the ability to exert control over increasingly advanced AI systems. The range of concerns is wide, spanning current day risks to future existential risks, and a range of loss of control pathways from rapid AI self-exfiltration scenarios to more gradual disempowerment scenarios. In this work we set out to firstly, provide a more structured framework for discussing and characterizing loss of control and secondly, to use this framework to assist those responsible for the safe operation of AI-containing socio-technical systems to identify causal factors leading to loss of control. We explore how these two needs can be better met by making use of a methodology developed within the safety-critical systems community known as STAMP and its associated hazard analysis technique of STPA. We select the STAMP methodology primarily because it is based around a world-view that socio-technical systems can be functionally modeled as control structures, and that safety issues arise when there is a loss of control in these structures.

Paper Structure

This paper contains 36 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Overview of the high-level process, source: Handbook, thanks to John Thomas for permission to use this picture
  • Figure 2: (a) Direct control of AI, (b) Control of a system that includes AI
  • Figure 3: Control system archetypes with additional AI instantiations.
  • Figure 4: Interacting control systems, hierarchical (vertical), and adjacent(horizontal)
  • Figure 5: The different control systems related to one AI shown by AI system life-cycle phase
  • ...and 5 more figures