ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents
Daksitha Withanage Don, Philipp Müller, Fabrizio Nunnari, Elisabeth André, Patrick Gebhard
TL;DR
ReNeLiB presents an open-source, real-time toolkit for neural listening behavior generation in socially interactive agents, combining real-time multimodal feature extraction, a neural behavior generator, and visualization for FLAME- and ARKit-based avatars. The approach extends Learn2Listen by using EMOCA for 3D facial reconstruction, employing a VQ-VAE–Transformer predictor trained on psychotherapy interaction data, and mapping FLAME expressions to ARKit through a Global-to-Local transformation. It demonstrates real-time performance with modular parallelism and provides end-to-end integration with contemporary IVA platforms such as VuppetMaster and MetaHuman. The work enables deployment of data-driven listening behaviors in telemedicine, mental health, and customer-support scenarios, and offers valuable open-source resources for researchers and practitioners.
Abstract
Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack of a software toolkit integrating such approaches with SIA frameworks that conforms to the challenging real-time requirements of human-agent interaction scenarios. In our work, we for the first time present such a toolkit consisting of three main components: (1) real-time feature extraction capturing multi-modal social cues from the user; (2) behavior generation based on a recent state-of-the-art neural network approach; (3) visualization of the generated behavior supporting both FLAME-based and Apple ARKit-based interactive agents. We comprehensively evaluate the real-time performance of the whole framework and its components. In addition, we introduce pre-trained behavioral generation models derived from psychotherapy sessions for domain-specific listening behaviors. Our software toolkit, pivotal for deploying and assessing SIAs' listening behavior in real-time, is publicly available. Resources, including code, behavioural multi-modal features extracted from therapeutic interactions, are hosted at https://daksitha.github.io/ReNeLib
