SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers
Keyang Xuan, Pengda Wang, Chongrui Ye, Haofei Yu, Tal August, Jiaxuan You
TL;DR
SocialVeil addresses the gap in evaluating language models under imperfect communication by introducing a barrier-aware social learning environment with a literature-grounded taxonomy of semantic vagueness, sociocultural mismatch, and emotional interference. It presents barrier injection, an episode-based simulation, and a barrier-aware evaluation protocol featuring Unresolved Confusion and Mutual Understanding, validated against human judgments. Across 720 scenarios and four frontier LLMs, barriers consistently degrade social interaction, with human studies confirming barrier fidelity and metric alignment, while adaptation strategies yield only modest improvements. The work advances realistic evaluation of social intelligence in LLMs and motivates developing more robust, barrier-aware training and grounding methods for socially adept AI agents.
Abstract
Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present \textsc{SocialVeil}, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded in a systematic literature review of communication challenges in human interaction, \textsc{SocialVeil} introduces three representative types of such disruption, \emph{semantic vagueness}, \emph{sociocultural mismatch}, and \emph{emotional interference}. We also introduce two barrier-aware evaluation metrics, \emph{unresolved confusion} and \emph{mutual understanding}, to evaluate interaction quality under impaired communication. Experiments across 720 scenarios and four frontier LLMs show that barriers consistently impair performance, with mutual understanding reduced by over 45\% on average, and confusion elevated by nearly 50\%. Human evaluations validate the fidelity of these simulated barriers (ICC$\approx$0.78, Pearson r$\approx$0.80). We further demonstrate that adaptation strategies (Repair Instruction and Interactive learning) only have a modest effect far from barrier-free performance. This work takes a step toward bringing social interaction environments closer to real-world communication, opening opportunities for exploring the social intelligence of LLM agents.
