AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning

Chanhyuk Park; Jungbin Cho; Junwan Kim; Seongmin Lee; Jungsu Kim; Sanghoon Lee

AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning

Chanhyuk Park, Jungbin Cho, Junwan Kim, Seongmin Lee, Jungsu Kim, Sanghoon Lee

TL;DR

Through user subjective tests, it is demonstrated that the proposed AVIN-Chat system provides users with a higher sense of immersion than previous chatbot systems.

Abstract

This work presents an audio-visual interactive chatbot (AVIN-Chat) system that allows users to have face-to-face conversations with 3D avatars in real-time. Compared to the previous chatbot services, which provide text-only or speech-only communications, the proposed AVIN-Chat can offer audio-visual communications providing users with a superior experience quality. In addition, the proposed AVIN-Chat emotionally speaks and expresses according to the user's emotional state. Thus, it enables users to establish a strong bond with the chatbot system, increasing the user's immersion. Through user subjective tests, it is demonstrated that the proposed system provides users with a higher sense of immersion than previous chatbot systems. The demonstration video is available at https://www.youtube.com/watch?v=Z74uIV9k7_k.

AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning

TL;DR

Through user subjective tests, it is demonstrated that the proposed AVIN-Chat system provides users with a higher sense of immersion than previous chatbot systems.

Abstract

Paper Structure (11 sections, 4 figures)

This paper contains 11 sections, 4 figures.

Introduction
Audio-Visual Interactive Chatbot
Overall Pipeline
Facial Avatar and Blendshapes Generation (Offline)
Text-Speech Processing (Online)
Speech-Driven Emotional Facial Animation (Online)
In-Context Learning with Prompts
Implementation Details
Experimental Results
Conclusion
Acknowledgements

Figures (4)

Figure 1: Example of the AVIN-Chat use. AVIN-Chat receives the user's speech and generates an audio-visual response in real-time.
Figure 2: Overall pipeline of the proposed audio-visual interactive chatbot (AVIN-Chat) system. AVIN-Chat is constructed with three sub-modules: 1) facial avatar and blendshapes generation, 2) text-speech processing, and 3) speech-driven emotional facial animation modules.
Figure 3: GUI visualization and example of actual use of the AVIN-Chat: (a) Initial GUI where users can ① capture an image or ② load an FBX file, and (b) conversation GUI where users can ③ define emotions and talk to ④ facial avatar by using the ⑤ record button.
Figure 4: User preference scores for different chatbot systems on intimacy, immersiveness, empathy, and overall satisfaction.

AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning

TL;DR

Abstract

AVIN-Chat: An Audio-Visual Interactive Chatbot System with Emotional State Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)