Table of Contents
Fetching ...

Extinction Risks from AI: Invisible to Science?

Vojtech Kovarik, Christian van Merwijk, Ida Mattsson

TL;DR

This work identifies a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law, and aims to understand which formal models are suitable for investigating this hypothesis.

Abstract

In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity", and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law holds or not. As our key contribution, we identify a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law. Since each of the conditions seems to significantly contribute to the complexity of the resulting model, formally evaluating the hypothesis might be exceedingly difficult. This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying dynamics might be invisible to current scientific methods.

Extinction Risks from AI: Invisible to Science?

TL;DR

This work identifies a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law, and aims to understand which formal models are suitable for investigating this hypothesis.

Abstract

In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity", and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law holds or not. As our key contribution, we identify a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law. Since each of the conditions seems to significantly contribute to the complexity of the resulting model, formally evaluating the hypothesis might be exceedingly difficult. This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying dynamics might be invisible to current scientific methods.
Paper Structure (15 sections, 1 figure)

This paper contains 15 sections, 1 figure.

Figures (1)

  • Figure 1: Two models for evaluating an argument that "a rocket will fail to land on the Moon because the Moon moves". The model (a) is clearly uninformative for this purpose, since it fails to capture this key dynamic. The model (b) is also uninformative, but less obviously so (it is missing the diameter of the Moon). Both models are inaccurate, but this is irrelevant for the purpose of evaluating the particular argument that was given.

Theorems & Definitions (1)

  • Definition 1: informal