Table of Contents
Fetching ...

AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

Yuanzhi Liang, Linchao Zhu, Yi Yang

TL;DR

The Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods, is introduced, designed for the quantitative and objective assessment of agents' interaction competencies.

Abstract

Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored, primarily due to the absence of robust, quantitative evaluation methods. This gap has slowed the development of agents proficient in more nuanced interactions beyond simple exchanges, for example, small talk. To address this challenge, we introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods. The interaction framework aims to foster an complex interaction environment that bolsters information exchange and intention expression within social interactions. Furthermore, we introduce evaluation methods, including two metrics: Information Exchanging Precision (IEP) and Interaction Expressiveness Gap (IEG), designed for the quantitative and objective assessment of agents' interaction competencies. Our findings highlight the utility of these evaluative methods and show significant potential for improving LLMs' ability to construct agents that interact in a more natural manner with human-like intricacy.

AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

TL;DR

The Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods, is introduced, designed for the quantitative and objective assessment of agents' interaction competencies.

Abstract

Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored, primarily due to the absence of robust, quantitative evaluation methods. This gap has slowed the development of agents proficient in more nuanced interactions beyond simple exchanges, for example, small talk. To address this challenge, we introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods. The interaction framework aims to foster an complex interaction environment that bolsters information exchange and intention expression within social interactions. Furthermore, we introduce evaluation methods, including two metrics: Information Exchanging Precision (IEP) and Interaction Expressiveness Gap (IEG), designed for the quantitative and objective assessment of agents' interaction competencies. Our findings highlight the utility of these evaluative methods and show significant potential for improving LLMs' ability to construct agents that interact in a more natural manner with human-like intricacy.
Paper Structure (15 sections, 1 equation, 4 figures, 2 tables)

This paper contains 15 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Real human interactions are marked by their efficient exchange of information and the clarity of their intentions, showcasing both complexity and depth. In contrast, LLM-driven agents' interactions chen2023agentversepark2023generative, as depicted in results (b), typically exhibit a lack of substantial content, resembling mere superficial interactions. AntEval framework encourages agents to partake in interactions that are both intricate and significant. Importantly, AntEval further introduces evaluation methods, specifically crafted to quantitatively evaluate the interactions based on informativeness and expressiveness. Our framework aims to provide an evaluation framework, guiding the enhancement of LLMs' abilities close to genuine human interaction.
  • Figure 2: Framework illustration for AntEval, showcasing the use of TRPG rules to create an interactive environment for agents. Agents engage in role-playing, aiming to participate in high-quality interactions for information exchange and intention expression, with the goal of completing game adventures. The framework involves detailed and diverse character settings based on the DND rulebook. Agents are involved in two types of scenarios: interacting based on intentions and exchanging knowledge, highlighting their capabilities in informative and expressive interactions.
  • Figure 3: Our AntEval evaluates informativeness and expressiveness through specific scenarios: information exchange and intention expression. We fine-tune virtual DMs with agent-generated and real interactions to assess expressiveness, and gauge informativeness by comparing agents' responses to the predefined knowledge.
  • Figure 4: Word cloud representing common descriptors for interactions by GPT-4 that underperformed in IEP evaluation.