Table of Contents
Fetching ...

The Loss of Control Playbook: Degrees, Dynamics, and Preparedness

Charlotte Stix, Annika Hallensleben, Alejandro Ortega, Matteo Pistillo

TL;DR

A graded LoC taxonomy is proposed, based on the metrics of severity and persistence, that distinguishes between Deviation, Bounded LoC, and Strict LoC and introduces a complementary framework that emphasizes three extrinsic factors: Deployment context, Affordances, and Permissions (the DAP framework).

Abstract

This research report addresses the absence of an actionable definition for Loss of Control (LoC) in AI systems by developing a novel taxonomy and preparedness framework. Despite increasing policy and research attention, existing LoC definitions vary significantly in scope and timeline, hindering effective LoC assessment and mitigation. To address this issue, we draw from an extensive literature review and propose a graded LoC taxonomy, based on the metrics of severity and persistence, that distinguishes between Deviation, Bounded LoC, and Strict LoC. We model pathways toward a societal state of vulnerability in which sufficiently advanced AI systems have acquired or could acquire the means to cause Bounded or Strict LoC once a catalyst, either misalignment or pure malfunction, materializes. We argue that this state becomes increasingly likely over time, absent strategic intervention, and propose a strategy to avoid reaching a state of vulnerability. Rather than focusing solely on intervening on AI capabilities and propensities potentially relevant for LoC or on preventing potential catalysts, we introduce a complementary framework that emphasizes three extrinsic factors: Deployment context, Affordances, and Permissions (the DAP framework). Compared to work on intrinsic factors and catalysts, this framework has the unfair advantage of being actionable today. Finally, we put forward a plan to maintain preparedness and prevent the occurrence of LoC outcomes should a state of societal vulnerability be reached, focusing on governance measures (threat modeling, deployment policies, emergency response) and technical controls (pre-deployment testing, control measures, monitoring) that could maintain a condition of perennial suspension.

The Loss of Control Playbook: Degrees, Dynamics, and Preparedness

TL;DR

A graded LoC taxonomy is proposed, based on the metrics of severity and persistence, that distinguishes between Deviation, Bounded LoC, and Strict LoC and introduces a complementary framework that emphasizes three extrinsic factors: Deployment context, Affordances, and Permissions (the DAP framework).

Abstract

This research report addresses the absence of an actionable definition for Loss of Control (LoC) in AI systems by developing a novel taxonomy and preparedness framework. Despite increasing policy and research attention, existing LoC definitions vary significantly in scope and timeline, hindering effective LoC assessment and mitigation. To address this issue, we draw from an extensive literature review and propose a graded LoC taxonomy, based on the metrics of severity and persistence, that distinguishes between Deviation, Bounded LoC, and Strict LoC. We model pathways toward a societal state of vulnerability in which sufficiently advanced AI systems have acquired or could acquire the means to cause Bounded or Strict LoC once a catalyst, either misalignment or pure malfunction, materializes. We argue that this state becomes increasingly likely over time, absent strategic intervention, and propose a strategy to avoid reaching a state of vulnerability. Rather than focusing solely on intervening on AI capabilities and propensities potentially relevant for LoC or on preventing potential catalysts, we introduce a complementary framework that emphasizes three extrinsic factors: Deployment context, Affordances, and Permissions (the DAP framework). Compared to work on intrinsic factors and catalysts, this framework has the unfair advantage of being actionable today. Finally, we put forward a plan to maintain preparedness and prevent the occurrence of LoC outcomes should a state of societal vulnerability be reached, focusing on governance measures (threat modeling, deployment policies, emergency response) and technical controls (pre-deployment testing, control measures, monitoring) that could maintain a condition of perennial suspension.

Paper Structure

This paper contains 32 sections, 7 figures.

Figures (7)

  • Figure 1: This graph plots 12 concrete LoC scenarios identified in the literature. We utilized economic impact as a proxy for severity and persistence, which are both mapped on arbitrary axes of 0-100. These data points inform our proposed three-part taxonomy, splitting apart Deviation, Bounded LoC, and Strict LoC. By sharpening the conceptual boundaries of LoC, this taxonomy helps decision-makers to understand the different degrees of LoC and hence better prioritise between risk-reduction strategies. See \ref{['findings_loc_lit']} for more detail on the methodology behind and limitations of this visualization.
  • Figure 1: The distribution in this graph covers all 12 concrete LoC scenarios derived from the literature, plotted by severity and persistence using economic impact as a proxy measure (both axes in arbitrary units 0-100). The colors of scenarios indicate the relevant threat category: human extinction, human manipulation, economic disruption, cybersecurity incident, grand-scale conflict/war, engineered pandemic, or disruption of critical national infrastructure. Where multiple economic impact estimates for a scenario were available, error bars represent 50% confidence intervals calculated using the t-distribution. For several scenarios, these error bars are too small to be visible on this log graph. For two scenarios, there are no error bars because only one estimate was available.
  • Figure 2: Our taxonomy of LoC.
  • Figure 3: A non-exhaustive illustration of how society could arrive at a state of vulnerability to LoC.
  • Figure 4: This figure illustrates the catalyst for LoC to materialize.
  • ...and 2 more figures