PrefIx: Understand and Adapt to User Preference in Human-Agent Interaction
Jialin Li, Zhenhao Chen, Hanjun Luo, Hanan Salam
TL;DR
PrefIx addresses the challenge of evaluating human-agent interaction quality alongside task accuracy by introducing a configurable environment and the Interaction-as-a-Tool paradigm (IaaT). It formalizes user experience through a taxonomy of 14 preference attributes across four dimensions and evaluates UX with a composite, multi-LLM judge across seven UX dimensions plus an alignment metric, achieving high reliability and human correlation. The study shows that preference-aware adaptation improves user experience (average ≈7.6%) and alignment (≈18.5%) without sacrificing task performance, demonstrated across multiple LLMs and BFCL-based multi-turn tasks. These contributions establish a scalable, reproducible framework for human-centered evaluation of interactive agents, with practical impact for developing more user-aligned AI assistants.
Abstract
LLM-based agents can complete tasks correctly yet still frustrate users through poor interaction patterns, such as excessive confirmations, opaque reasoning, or misaligned pacing. Current benchmarks evaluate task accuracy but overlook how agents interact: whether they infer preferences from implicit cues, adapt dynamically, or maintain fine-grained interaction quality. We introduce Prefix, a configurable environment that evaluates both what agents accomplish and how they interact. Central to Prefix is the Interaction-as-a-Tool (IaaT) paradigm, which treats interaction behaviors as structured tool calls, unifying them with existing evaluation frameworks. We define 31 preference settings across 14 attributes and formalize user experience (UX) as a core metric alongside task accuracy. A composite LLM-as-a-Judge mechanism across seven UX dimensions achieves strong aggregate reliability (ICC > 0.79), high internal consistency (alpha = 0.943), and human correlation (rho = 0.52-0.78). Preference-aware agents show 7.6% average UX improvement and 18.5% gain in preference alignment. Our work is openly accessible.
