AnimeGaze: Real-Time Mutual Gaze Synthesis for Anime-Style Avatars in Physical Environments via Behind-Display Camera
Kazuya Izumi, Shuhey Koyama, Yoichi Ochiai
TL;DR
AnimeGaze addresses the lack of gaze engagement for on‑screen avatars by introducing a behind‑display camera and a transparent display to physically align the avatar's eyes with the camera, enabling real‑time mutual gaze with the physical environment. It formalizes gaze transmission as a cross‑space 3D–2D problem, employs a PnP‑based gaze target estimation with $R$, $t$, and $X_{obj}$ to minimize $E_{PnP}$, $E_{reproj}$, and $E_{gaze}$, and introduces a perception‑oriented calibration using $E_{perception}$ to reduce the Mona Lisa effect. The system supports avatars with arbitrary eye configurations and includes a symbolic regression‑based calibration (4.2.1) to align perceived and actual gaze, validated by a user study showing improved eye contact and attentiveness. The work demonstrates a practical path toward immersive, human‑like AI interactions in everyday environments and lays groundwork for extending mutual gaze to non‑human avatars and broader gaze behaviors.
Abstract
Avatars on displays lack the ability to engage with the physical environment through gaze. To address this limitation, we propose a gaze synthesis method that enables animated avatars to establish gaze communication with the physical environment using a camera-behind-the-display system. The system uses a display that rapidly alternates between visible and transparent states. During the transparent state, a camera positioned behind the display captures the physical environment. This configuration physically aligns the position of the avatar's eyes with the camera, enabling two-way gaze communication with people and objects in the physical environment. Building on this system, we developed a framework for mutual gaze communication between avatars and people. The framework detects the user's gaze and dynamically synthesizes the avatar's gaze towards people or objects in the environment. This capability was integrated into an AI agent system to generate real-time, context-aware gaze behaviors during conversations, enabling more seamless and natural interactions. To evaluate the system, we conducted a user study to assess its effectiveness in supporting physical gaze awareness and generating human-like gaze behaviors. The results show that the behind-display approach significantly enhances the user's perception of being observed and attended to by the avatar. By bridging the gap between virtual avatars and the physical environment through enhanced gaze interactions, our system offers a promising avenue for more immersive and human-like AI-mediated communication in everyday environments.
