Table of Contents
Fetching ...

SOTOPIA-$Ω$: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents

Wenyuan Zhang, Tianyun Liu, Mengxiao Song, Xiaodong Li, Tingwen Liu

TL;DR

SOTOPIA-$Omega$ presents a dynamic strategy injection framework that leverages negotiation theory to generate a high-quality social-dialogue corpus for training language agents. It introduces Social Instruction Following ($S$-IF) and two per-turn metrics, $S_{div}$ and $S_{rel}$, to evaluate goal-aligned, diverse dialogue in multi-agent settings. Empirical results show 7B-scale models fine-tuned on the SOTOPIA-$Omega$ corpus surpass GPT-4 on social-goal achievement and exhibit improved $S$-IF performance, with dynamic strategy construction mitigating deadlocks and improving conversation efficiency. Variant, corpus, scaling, and safety analyses support the robustness and generality of the approach, and the authors release open-source data and models for research use. Future work targets improved numerical reasoning, reinforcement learning for adaptable social agents, and expanded evaluation frameworks to better capture social competence.

Abstract

Despite the abundance of prior social strategies possessed by humans, there remains a paucity of research dedicated to their transfer and integration into social agents. Our proposed SOTOPIA-$Ω$ framework aims to address and bridge this gap, with a particular focus on enhancing the social capabilities of language agents. This framework dynamically injects multi-step reasoning strategies inspired by negotiation theory and two simple direct strategies into expert agents, thereby automating the construction of a high-quality social dialogue training corpus. Additionally, we introduce the concept of Social Instruction Following (S-IF) and propose two new S-IF evaluation metrics that complement social capability. We demonstrate that several 7B models trained on high-quality corpus not only significantly surpass the expert agent (GPT-4) in achieving social goals but also enhance S-IF performance. Analysis and variant experiments validate the advantages of dynamic construction, which can especially break the agent's prolonged deadlock.

SOTOPIA-$Ω$: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents

TL;DR

SOTOPIA- presents a dynamic strategy injection framework that leverages negotiation theory to generate a high-quality social-dialogue corpus for training language agents. It introduces Social Instruction Following (-IF) and two per-turn metrics, and , to evaluate goal-aligned, diverse dialogue in multi-agent settings. Empirical results show 7B-scale models fine-tuned on the SOTOPIA- corpus surpass GPT-4 on social-goal achievement and exhibit improved -IF performance, with dynamic strategy construction mitigating deadlocks and improving conversation efficiency. Variant, corpus, scaling, and safety analyses support the robustness and generality of the approach, and the authors release open-source data and models for research use. Future work targets improved numerical reasoning, reinforcement learning for adaptable social agents, and expanded evaluation frameworks to better capture social competence.

Abstract

Despite the abundance of prior social strategies possessed by humans, there remains a paucity of research dedicated to their transfer and integration into social agents. Our proposed SOTOPIA- framework aims to address and bridge this gap, with a particular focus on enhancing the social capabilities of language agents. This framework dynamically injects multi-step reasoning strategies inspired by negotiation theory and two simple direct strategies into expert agents, thereby automating the construction of a high-quality social dialogue training corpus. Additionally, we introduce the concept of Social Instruction Following (S-IF) and propose two new S-IF evaluation metrics that complement social capability. We demonstrate that several 7B models trained on high-quality corpus not only significantly surpass the expert agent (GPT-4) in achieving social goals but also enhance S-IF performance. Analysis and variant experiments validate the advantages of dynamic construction, which can especially break the agent's prolonged deadlock.

Paper Structure

This paper contains 59 sections, 4 equations, 6 figures, 28 tables.

Figures (6)

  • Figure 1: The average Goal scores per turn in GPT-4 self-play are shown for 70 hard and 380 ordinary tasks in SOTOPIA zhou2024sotopia. Goal measures how well each agent achieves its social goal during interaction. In both settings, expert agents struggle to significantly improve their goal scores after only a few of turns. More details are provided in Sec §\ref{['sec:sotopia_intro']}.
  • Figure 2: The architecture and details of SOTOPIA-$\Omega$. (A) represents the overall architecture for data generation. (B) provides the step rating details of (A), demonstrating the process of injecting three strategies. (C) illustrates the negotiation strategy injection workflow, where the input at each step is the current dialogue history, and the output is the final response. The bottom-right corner shows the input-output flow of the negotiation strategy.
  • Figure 3: The pre-experiment for social instruction following evaluation uses Llama3-8B and GPT-4. (a-d) illustrate the relationship between action diversity, topic relevance, and goal scores across 450 tasks in SOTOPIA, with goal curves fitted using a third-order polynomial. (e-h) present four cases from Llama3-8B. In the heatmaps, the left side shows the cosine similarity matrix of all actions one agent generates in a single task. In contrast, the right side indicates the goal relevance of each action, with red denoting poor performance.
  • Figure 4: Corpora distribution and step rating. Dot size in (b) indicates the number of step goals calculated.
  • Figure 5: Train Qwen2.5-0.5/1.5/7B agents using DSI corpus. The figure shows Goal, $S_{div}$ and $S_{rel}$.
  • ...and 1 more figures