ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents

Daksitha Withanage Don; Philipp Müller; Fabrizio Nunnari; Elisabeth André; Patrick Gebhard

ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents

Daksitha Withanage Don, Philipp Müller, Fabrizio Nunnari, Elisabeth André, Patrick Gebhard

TL;DR

ReNeLiB presents an open-source, real-time toolkit for neural listening behavior generation in socially interactive agents, combining real-time multimodal feature extraction, a neural behavior generator, and visualization for FLAME- and ARKit-based avatars. The approach extends Learn2Listen by using EMOCA for 3D facial reconstruction, employing a VQ-VAE–Transformer predictor trained on psychotherapy interaction data, and mapping FLAME expressions to ARKit through a Global-to-Local transformation. It demonstrates real-time performance with modular parallelism and provides end-to-end integration with contemporary IVA platforms such as VuppetMaster and MetaHuman. The work enables deployment of data-driven listening behaviors in telemedicine, mental health, and customer-support scenarios, and offers valuable open-source resources for researchers and practitioners.

Abstract

Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack of a software toolkit integrating such approaches with SIA frameworks that conforms to the challenging real-time requirements of human-agent interaction scenarios. In our work, we for the first time present such a toolkit consisting of three main components: (1) real-time feature extraction capturing multi-modal social cues from the user; (2) behavior generation based on a recent state-of-the-art neural network approach; (3) visualization of the generated behavior supporting both FLAME-based and Apple ARKit-based interactive agents. We comprehensively evaluate the real-time performance of the whole framework and its components. In addition, we introduce pre-trained behavioral generation models derived from psychotherapy sessions for domain-specific listening behaviors. Our software toolkit, pivotal for deploying and assessing SIAs' listening behavior in real-time, is publicly available. Resources, including code, behavioural multi-modal features extracted from therapeutic interactions, are hosted at https://daksitha.github.io/ReNeLib

ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents

TL;DR

Abstract

Paper Structure (23 sections, 5 equations, 5 figures, 2 tables)

This paper contains 23 sections, 5 equations, 5 figures, 2 tables.

Introduction
Related Work
Interactive Virtual Agents
Data-driven Approaches for Behavioral Animation Synthesis
Behaviour Representation
Framework
Framework Modules
Implementation
Facial and Audio Representations
Real-life Therapy Interaction Dataset
Conditional Motion Synthesis of Conversational Dynamics
Model Training
Discussion
Real-time Framework Application
Constructing the Global-to-Local Transformation Matrix
...and 8 more sections

Figures (5)

Figure 1: Overview of the real-time interactive framework modules, utilizing a publisher-subscriber pattern to enable real-time listener behavior. The framework supports both online and offline modes, allowing for feature extraction from sensors or streaming from locally saved files. The solid arrows represent the modules used in our evaluation.
Figure 2: VQ-VAE training process
Figure 3: Predictor training process
Figure 4: FLAME to ARKit expression mapper
Figure 5: Network latency plot for each framework module for different frame rates

ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents

TL;DR

Abstract

ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (5)