Extinction Risks from AI: Invisible to Science?

Vojtech Kovarik; Christian van Merwijk; Ida Mattsson

Extinction Risks from AI: Invisible to Science?

Vojtech Kovarik, Christian van Merwijk, Ida Mattsson

TL;DR

This work identifies a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law, and aims to understand which formal models are suitable for investigating this hypothesis.

Abstract

In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity", and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law holds or not. As our key contribution, we identify a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law. Since each of the conditions seems to significantly contribute to the complexity of the resulting model, formally evaluating the hypothesis might be exceedingly difficult. This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying dynamics might be invisible to current scientific methods.

Extinction Risks from AI: Invisible to Science?

TL;DR

Abstract

Paper Structure (15 sections, 1 figure)

This paper contains 15 sections, 1 figure.

Introduction
Illustrative Example
Overview
An Argument for Extinction Risk from Arbitrarily Powerful Optimisation
Necessary Conditions for Informative Models
Conditions Derived from the Argument in Section 2
Argument-Independent Conditions
Discussion of Related Literature
Key Prerequisite Concepts
Connection to Other Related Work
Conclusion
Additional Content
Restating the Argument from Section 2
Examples Illustrating the Argument from Section 2
Relationships between the Necessary Conditions

Figures (1)

Figure 1: Two models for evaluating an argument that "a rocket will fail to land on the Moon because the Moon moves". The model (a) is clearly uninformative for this purpose, since it fails to capture this key dynamic. The model (b) is also uninformative, but less obviously so (it is missing the diameter of the Moon). Both models are inaccurate, but this is irrelevant for the purpose of evaluating the particular argument that was given.

Theorems & Definitions (1)

Definition 1: informal

Extinction Risks from AI: Invisible to Science?

TL;DR

Abstract

Extinction Risks from AI: Invisible to Science?

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (1)