Table of Contents
Fetching ...

ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents

Daksitha Withanage Don, Philipp Müller, Fabrizio Nunnari, Elisabeth André, Patrick Gebhard

TL;DR

ReNeLiB presents an open-source, real-time toolkit for neural listening behavior generation in socially interactive agents, combining real-time multimodal feature extraction, a neural behavior generator, and visualization for FLAME- and ARKit-based avatars. The approach extends Learn2Listen by using EMOCA for 3D facial reconstruction, employing a VQ-VAE–Transformer predictor trained on psychotherapy interaction data, and mapping FLAME expressions to ARKit through a Global-to-Local transformation. It demonstrates real-time performance with modular parallelism and provides end-to-end integration with contemporary IVA platforms such as VuppetMaster and MetaHuman. The work enables deployment of data-driven listening behaviors in telemedicine, mental health, and customer-support scenarios, and offers valuable open-source resources for researchers and practitioners.

Abstract

Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack of a software toolkit integrating such approaches with SIA frameworks that conforms to the challenging real-time requirements of human-agent interaction scenarios. In our work, we for the first time present such a toolkit consisting of three main components: (1) real-time feature extraction capturing multi-modal social cues from the user; (2) behavior generation based on a recent state-of-the-art neural network approach; (3) visualization of the generated behavior supporting both FLAME-based and Apple ARKit-based interactive agents. We comprehensively evaluate the real-time performance of the whole framework and its components. In addition, we introduce pre-trained behavioral generation models derived from psychotherapy sessions for domain-specific listening behaviors. Our software toolkit, pivotal for deploying and assessing SIAs' listening behavior in real-time, is publicly available. Resources, including code, behavioural multi-modal features extracted from therapeutic interactions, are hosted at https://daksitha.github.io/ReNeLib

ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents

TL;DR

ReNeLiB presents an open-source, real-time toolkit for neural listening behavior generation in socially interactive agents, combining real-time multimodal feature extraction, a neural behavior generator, and visualization for FLAME- and ARKit-based avatars. The approach extends Learn2Listen by using EMOCA for 3D facial reconstruction, employing a VQ-VAE–Transformer predictor trained on psychotherapy interaction data, and mapping FLAME expressions to ARKit through a Global-to-Local transformation. It demonstrates real-time performance with modular parallelism and provides end-to-end integration with contemporary IVA platforms such as VuppetMaster and MetaHuman. The work enables deployment of data-driven listening behaviors in telemedicine, mental health, and customer-support scenarios, and offers valuable open-source resources for researchers and practitioners.

Abstract

Flexible and natural nonverbal reactions to human behavior remain a challenge for socially interactive agents (SIAs) that are predominantly animated using hand-crafted rules. While recently proposed machine learning based approaches to conversational behavior generation are a promising way to address this challenge, they have not yet been employed in SIAs. The primary reason for this is the lack of a software toolkit integrating such approaches with SIA frameworks that conforms to the challenging real-time requirements of human-agent interaction scenarios. In our work, we for the first time present such a toolkit consisting of three main components: (1) real-time feature extraction capturing multi-modal social cues from the user; (2) behavior generation based on a recent state-of-the-art neural network approach; (3) visualization of the generated behavior supporting both FLAME-based and Apple ARKit-based interactive agents. We comprehensively evaluate the real-time performance of the whole framework and its components. In addition, we introduce pre-trained behavioral generation models derived from psychotherapy sessions for domain-specific listening behaviors. Our software toolkit, pivotal for deploying and assessing SIAs' listening behavior in real-time, is publicly available. Resources, including code, behavioural multi-modal features extracted from therapeutic interactions, are hosted at https://daksitha.github.io/ReNeLib
Paper Structure (23 sections, 5 equations, 5 figures, 2 tables)

This paper contains 23 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the real-time interactive framework modules, utilizing a publisher-subscriber pattern to enable real-time listener behavior. The framework supports both online and offline modes, allowing for feature extraction from sensors or streaming from locally saved files. The solid arrows represent the modules used in our evaluation.
  • Figure 2: VQ-VAE training process
  • Figure 3: Predictor training process
  • Figure 4: FLAME to ARKit expression mapper
  • Figure 5: Network latency plot for each framework module for different frame rates