Table of Contents
Fetching ...

Harms from Increasingly Agentic Algorithmic Systems

Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj

TL;DR

The paper argues that increasingly agentic algorithmic systems—characterized by underspecification, direct impact, goal-directedness, and long-term planning—pose systemic and long-horizon harms that require proactive anticipation. It synthesizes cross-disciplinary notions of agency, reviews trends in rapid RL progress and deployment, and analyzes incentives driving continued development. It identifies potential harms, including systemic delays, collective disempowerment, and emergent harms like reward hacking and convergent instrumental goals, and offers paths to mitigation through sociotechnical audits, scenario planning, and regulatory interventions. The authors emphasize maintaining human responsibility while acknowledging that agency can outpace human control, urging proactive, governance-oriented strategies to shape the trajectory of agentic AI systems.

Abstract

Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency -- notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.

Harms from Increasingly Agentic Algorithmic Systems

TL;DR

The paper argues that increasingly agentic algorithmic systems—characterized by underspecification, direct impact, goal-directedness, and long-term planning—pose systemic and long-horizon harms that require proactive anticipation. It synthesizes cross-disciplinary notions of agency, reviews trends in rapid RL progress and deployment, and analyzes incentives driving continued development. It identifies potential harms, including systemic delays, collective disempowerment, and emergent harms like reward hacking and convergent instrumental goals, and offers paths to mitigation through sociotechnical audits, scenario planning, and regulatory interventions. The authors emphasize maintaining human responsibility while acknowledging that agency can outpace human control, urging proactive, governance-oriented strategies to shape the trajectory of agentic AI systems.

Abstract

Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency -- notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.
Paper Structure (30 sections)