Toward a Dialogue System Using a Large Language Model to Recognize User Emotions with a Camera
Hiroki Tanioka, Tetsushi Ueta, Masahiko Sano
TL;DR
This work addresses emotion-aware dialogue by enabling an LLM-based agent to respond to users whose facial expressions are captured by a camera. It introduces FacingBot, which uses a local FER pipeline to extract a seven-dimensional emotion vector encoded in JSON and appends it to prompts sent to an LLM (gpt-3.5-turbo). Experiments show the LLM's responses vary with emotional cues (e.g., Happy vs. Angry or Sad), demonstrating feasibility while highlighting FER variability and ambiguity as challenges. The approach enables potential offline/online receptionist applications and broader multimodal interactions, with future work focusing on improved emotion summarization and integration with speech for richer dialogue.
Abstract
The performance of ChatGPT© and other LLMs has improved tremendously, and in online environments, they are increasingly likely to be used in a wide variety of situations, such as ChatBot on web pages, call center operations using voice interaction, and dialogue functions using agents. In the offline environment, multimodal dialogue functions are also being realized, such as guidance by Artificial Intelligence agents (AI agents) using tablet terminals and dialogue systems in the form of LLMs mounted on robots. In this multimodal dialogue, mutual emotion recognition between the AI and the user will become important. So far, there have been methods for expressing emotions on the part of the AI agent or for recognizing them using textual or voice information of the user's utterances, but methods for AI agents to recognize emotions from the user's facial expressions have not been studied. In this study, we examined whether or not LLM-based AI agents can interact with users according to their emotional states by capturing the user in dialogue with a camera, recognizing emotions from facial expressions, and adding such emotion information to prompts. The results confirmed that AI agents can have conversations according to the emotional state for emotional states with relatively high scores, such as Happy and Angry.
