SAPIEN: Affective Virtual Agents Powered by Large Language Models
Masum Hasan, Cengiz Ozel, Sammy Potter, Ehsan Hoque
TL;DR
SAPIEN addresses the need for naturalistic, affective, multilingual virtual agents capable of open-domain dialogue and post-interaction coaching. It presents an end-to-end pipeline that integrates speech recognition, instruction-following LLMs, emotion prediction, TTS, and facial animation via a motion-capture database, all delivered in near real-time with guardrails. The platform enables extensive customization (avatar traits, language, conversation premise) and provides actionable feedback after conversations, highlighting potential applications in communication training, language learning, healthcare, leadership, and education. Ethical safeguards, memory non-retention, and session length limits are integrated to mitigate risks, making SAPIEN suitable for broad, responsible deployment and interactive demonstrations in conference settings.
Abstract
In this demo paper, we introduce SAPIEN, a platform for high-fidelity virtual agents driven by large language models that can hold open domain conversations with users in 13 different languages, and display emotions through facial expressions and voice. The platform allows users to customize their virtual agent's personality, background, and conversation premise, thus providing a rich, immersive interaction experience. Furthermore, after the virtual meeting, the user can choose to get the conversation analyzed and receive actionable feedback on their communication skills. This paper illustrates an overview of the platform and discusses the various application domains of this technology, ranging from entertainment to mental health, communication training, language learning, education, healthcare, and beyond. Additionally, we consider the ethical implications of such realistic virtual agent representations and the potential challenges in ensuring responsible use.
