Word Synchronization Challenge: A Benchmark for Word Association Responses for Large Language Models
Tanguy Cazalets, Joni Dambre
TL;DR
The paper proposes the Word Synchronization Challenge as a dynamic benchmark to evaluate LLMs’ ability to capture human word associations and social cognition in HCI. It employs a dyadic word game and dataset generation across multiple LLM pairings, analyzing interaction histories with embedding-based distances and PCA visualizations to study synchronization and strategy. Findings show that higher-sophistication models achieve higher success rates and favor a balancing strategy, with successful interactions revealing multi-manifold semantic convergence. The benchmark offers a flexible framework to assess human-like alignment and theory-of-mind in AI-assisted communication, informing the design of empathetic, collaborative human-AI systems and guiding future research on cognitive mechanisms and biases in language models.
Abstract
This paper introduces the Word Synchronization Challenge, a novel benchmark to evaluate large language models (LLMs) in Human-Computer Interaction (HCI). This benchmark uses a dynamic game-like framework to test LLMs ability to mimic human cognitive processes through word associations. By simulating complex human interactions, it assesses how LLMs interpret and align with human thought patterns during conversational exchanges, which are essential for effective social partnerships in HCI. Initial findings highlight the influence of model sophistication on performance, offering insights into the models capabilities to engage in meaningful social interactions and adapt behaviors in human-like ways. This research advances the understanding of LLMs potential to replicate or diverge from human cognitive functions, paving the way for more nuanced and empathetic human-machine collaborations.
