Table of Contents
Fetching ...

The Elephant in the Room -- Why AI Safety Demands Diverse Teams

David Rostcheck, Lara Scheibling

TL;DR

The paper argues that current AI safety efforts over-prioritize game-theoretic and technically oriented methods, failing to capture the complexities of real-world alignment. It proposes treating AI alignment as a social science problem and introduces a three-step framework based on a positive North Star, proper framing of knowns and unknowns, and diverse encounter teams drawn from social sciences. Notable contributions include the concept of a subject-based North Star, the critical role of media as data stores and pattern libraries, and the inclusion of AI agents within teams to augment human reasoning. The work aims to provide a practical, scalable pathway for safer, more cooperative human/AI systems and invites empirical study of interdisciplinary, media-informed alignment practices.

Abstract

We consider that existing approaches to AI "safety" and "alignment" may not be using the most effective tools, teams, or approaches. We suggest that an alternative and better approach to the problem may be to treat alignment as a social science problem, since the social sciences enjoy a rich toolkit of models for understanding and aligning motivation and behavior, much of which could be repurposed to problems involving AI models, and enumerate reasons why this is so. We introduce an alternate alignment approach informed by social science tools and characterized by three steps: 1. defining a positive desired social outcome for human/AI collaboration as the goal or "North Star," 2. properly framing knowns and unknowns, and 3. forming diverse teams to investigate, observe, and navigate emerging challenges in alignment.

The Elephant in the Room -- Why AI Safety Demands Diverse Teams

TL;DR

The paper argues that current AI safety efforts over-prioritize game-theoretic and technically oriented methods, failing to capture the complexities of real-world alignment. It proposes treating AI alignment as a social science problem and introduces a three-step framework based on a positive North Star, proper framing of knowns and unknowns, and diverse encounter teams drawn from social sciences. Notable contributions include the concept of a subject-based North Star, the critical role of media as data stores and pattern libraries, and the inclusion of AI agents within teams to augment human reasoning. The work aims to provide a practical, scalable pathway for safer, more cooperative human/AI systems and invites empirical study of interdisciplinary, media-informed alignment practices.

Abstract

We consider that existing approaches to AI "safety" and "alignment" may not be using the most effective tools, teams, or approaches. We suggest that an alternative and better approach to the problem may be to treat alignment as a social science problem, since the social sciences enjoy a rich toolkit of models for understanding and aligning motivation and behavior, much of which could be repurposed to problems involving AI models, and enumerate reasons why this is so. We introduce an alternate alignment approach informed by social science tools and characterized by three steps: 1. defining a positive desired social outcome for human/AI collaboration as the goal or "North Star," 2. properly framing knowns and unknowns, and 3. forming diverse teams to investigate, observe, and navigate emerging challenges in alignment.
Paper Structure (11 sections)