Investigating Agency of LLMs in Human-AI Collaboration Tasks
Ashish Sharma, Sudha Rao, Chris Brockett, Akanksha Malhotra, Nebojsa Jojic, Bill Dolan
TL;DR
This work defines and operationalizes Agency for LLMs in human–AI collaboration using Bandura's social‑cognitive theory, decomposing Agency into Intentionality, Motivation, Self‑Efficacy, and Self‑Regulation. It builds a collaborative interior‑design testbed and a new human–human dataset of 83 conversations with 908 Agency‑annotated snippets to study how agentive dialogue affects outcomes, alongside a second task for generating agentive dialogue. By presenting Task 1 (measuring Agency in dialogue) and Task 2 (generating agentive dialogue) and evaluating multiple LLMs and prompting/finetuning strategies, the paper demonstrates that stronger Agency features correlate with higher perceived agency and task satisfaction, and that demonstrations of Agency can boost model performance. The work provides benchmarks, baselines, and methodological tools for creating controllable, agentive language models in creative collaboration while highlighting ethical considerations and domain limitations, and it offers data and code to advance this area of research.
Abstract
Agency, the capacity to proactively shape events, is central to how humans interact and collaborate. While LLMs are being developed to simulate human behavior and serve as human-like agents, little attention has been given to the Agency that these models should possess in order to proactively manage the direction of interaction and collaboration. In this paper, we investigate Agency as a desirable function of LLMs, and how it can be measured and managed. We build on social-cognitive theory to develop a framework of features through which Agency is expressed in dialogue - indicating what you intend to do (Intentionality), motivating your intentions (Motivation), having self-belief in intentions (Self-Efficacy), and being able to self-adjust (Self-Regulation). We collect a new dataset of 83 human-human collaborative interior design conversations containing 908 conversational snippets annotated for Agency features. Using this dataset, we develop methods for measuring Agency of LLMs. Automatic and human evaluations show that models that manifest features associated with high Intentionality, Motivation, Self-Efficacy, and Self-Regulation are more likely to be perceived as strongly agentive.
