Frictional Agent Alignment Framework: Slow Down and Don't Break Things

Abhijnan Nath; Carine Graff; Andrei Bachinin; Nikhil Krishnaswamy

Frictional Agent Alignment Framework: Slow Down and Don't Break Things

Abhijnan Nath, Carine Graff, Andrei Bachinin, Nikhil Krishnaswamy

TL;DR

FAAF addresses misalignment in dynamic, collaborative dialogue by introducing a state-conditioned two-policy framework that jointly models frictive states and friction interventions. The authors derive an analytical, single-policy-friendly objective with a simple supervised loss, and validate FAAF on DeliData and Weights Task Dataset variants, showing robust OOD generalization and superior friction quality compared with RLHF-based baselines. Data augmentation with GPT-4o and human validation underpin empirical gains, while ablations demonstrate the necessity of conditioning on frictive state in achieving strong performance. Overall, FAAF advances the idea of LLMs as adaptive thought partners that slow down dialogue strategically to improve reasoning and accountability in human-AI collaboration.

Abstract

AI support of collaborative interactions entails mediating potential misalignment between interlocutor beliefs. Common preference alignment methods like DPO excel in static settings, but struggle in dynamic collaborative tasks where the explicit signals of interlocutor beliefs are sparse and skewed. We propose the Frictional Agent Alignment Framework (FAAF), to generate precise, context-aware "friction" that prompts for deliberation and re-examination of existing evidence. FAAF's two-player objective decouples from data skew: a frictive-state policy identifies belief misalignments, while an intervention policy crafts collaborator-preferred responses. We derive an analytical solution to this objective, enabling training a single policy via a simple supervised loss. Experiments on three benchmarks show FAAF outperforms competitors in producing concise, interpretable friction and in OOD generalization. By aligning LLMs to act as adaptive "thought partners" -- not passive responders -- FAAF advances scalable, dynamic human-AI collaboration. Our code and data can be found at https://github.com/csu-signal/FAAF_ACL.

Frictional Agent Alignment Framework: Slow Down and Don't Break Things

TL;DR

Abstract

Frictional Agent Alignment Framework: Slow Down and Don't Break Things

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (13)