SOTOPIA-$Ω$: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents
Wenyuan Zhang, Tianyun Liu, Mengxiao Song, Xiaodong Li, Tingwen Liu
TL;DR
SOTOPIA-$Omega$ presents a dynamic strategy injection framework that leverages negotiation theory to generate a high-quality social-dialogue corpus for training language agents. It introduces Social Instruction Following ($S$-IF) and two per-turn metrics, $S_{div}$ and $S_{rel}$, to evaluate goal-aligned, diverse dialogue in multi-agent settings. Empirical results show 7B-scale models fine-tuned on the SOTOPIA-$Omega$ corpus surpass GPT-4 on social-goal achievement and exhibit improved $S$-IF performance, with dynamic strategy construction mitigating deadlocks and improving conversation efficiency. Variant, corpus, scaling, and safety analyses support the robustness and generality of the approach, and the authors release open-source data and models for research use. Future work targets improved numerical reasoning, reinforcement learning for adaptable social agents, and expanded evaluation frameworks to better capture social competence.
Abstract
Despite the abundance of prior social strategies possessed by humans, there remains a paucity of research dedicated to their transfer and integration into social agents. Our proposed SOTOPIA-$Ω$ framework aims to address and bridge this gap, with a particular focus on enhancing the social capabilities of language agents. This framework dynamically injects multi-step reasoning strategies inspired by negotiation theory and two simple direct strategies into expert agents, thereby automating the construction of a high-quality social dialogue training corpus. Additionally, we introduce the concept of Social Instruction Following (S-IF) and propose two new S-IF evaluation metrics that complement social capability. We demonstrate that several 7B models trained on high-quality corpus not only significantly surpass the expert agent (GPT-4) in achieving social goals but also enhance S-IF performance. Analysis and variant experiments validate the advantages of dynamic construction, which can especially break the agent's prolonged deadlock.
