Table of Contents
Fetching ...

IDs for AI Systems

Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung

TL;DR

The paper addresses the information gap surrounding engaging with AI systems by proposing instance-level AI IDs—each comprising a unique identifier and a set of attributes tied to a specific interaction. It outlines a design space (attributes, access, verifiability), concrete use cases (shutdown of malfunctioning agents, certification verification, scam-call investigations), and assesses demand from governments, service providers, and users, alongside deployment strategies and limitations. The authors discuss potential implementations, lifecycle considerations, and significant risks including privacy, misuse, and broader societal effects, advocating limited, high-stakes experimentation. If realized, AI IDs could enable better incident response, accountability, and trust in a world where AI systems increasingly interact with critical infrastructure and humans.

Abstract

AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.

IDs for AI Systems

TL;DR

The paper addresses the information gap surrounding engaging with AI systems by proposing instance-level AI IDs—each comprising a unique identifier and a set of attributes tied to a specific interaction. It outlines a design space (attributes, access, verifiability), concrete use cases (shutdown of malfunctioning agents, certification verification, scam-call investigations), and assesses demand from governments, service providers, and users, alongside deployment strategies and limitations. The authors discuss potential implementations, lifecycle considerations, and significant risks including privacy, misuse, and broader societal effects, advocating limited, high-stakes experimentation. If realized, AI IDs could enable better incident response, accountability, and trust in a world where AI systems increasingly interact with critical infrastructure and humans.

Abstract

AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.
Paper Structure (31 sections, 7 figures, 1 table)

This paper contains 31 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: IDs contain a unique identifier along with attributes (e.g., a system card, certifications, or a link to previous incidents). We also display some potential actions that parties might take based on information in an ID.
  • Figure 2: We illustrate how existing technologies can help ensure the verifiability of IDs against the threat models in \ref{['sec:verifiability']}. We emphasize that our examples of existing technologies are meant to be illustrative, not prescriptive.
  • Figure 3: Various actors have potential (dotted lines) methods to incentivize the use of IDs, whether directly or through using other actors.
  • Figure 4: This figure depicts two users that use the same system (e.g., both use ChatGPT with the GPT-4 backend). System-specific information attached to an ID should be useful to both users. By loaded, we mean that the system is ready to accept inputs for the first time.
  • Figure 5: An instance is an abstraction that corresponds to a creation event, where a system is loaded for a user, and an interaction history. Instance-specific IDs could help to inform interaction decisions. For example, information about instances $A$'s earlier interactions (such as malfunctions) may be useful when instance $B$ interacts with instance $A$.
  • ...and 2 more figures