Table of Contents
Fetching ...

AnimeGaze: Real-Time Mutual Gaze Synthesis for Anime-Style Avatars in Physical Environments via Behind-Display Camera

Kazuya Izumi, Shuhey Koyama, Yoichi Ochiai

TL;DR

AnimeGaze addresses the lack of gaze engagement for on‑screen avatars by introducing a behind‑display camera and a transparent display to physically align the avatar's eyes with the camera, enabling real‑time mutual gaze with the physical environment. It formalizes gaze transmission as a cross‑space 3D–2D problem, employs a PnP‑based gaze target estimation with $R$, $t$, and $X_{obj}$ to minimize $E_{PnP}$, $E_{reproj}$, and $E_{gaze}$, and introduces a perception‑oriented calibration using $E_{perception}$ to reduce the Mona Lisa effect. The system supports avatars with arbitrary eye configurations and includes a symbolic regression‑based calibration (4.2.1) to align perceived and actual gaze, validated by a user study showing improved eye contact and attentiveness. The work demonstrates a practical path toward immersive, human‑like AI interactions in everyday environments and lays groundwork for extending mutual gaze to non‑human avatars and broader gaze behaviors.

Abstract

Avatars on displays lack the ability to engage with the physical environment through gaze. To address this limitation, we propose a gaze synthesis method that enables animated avatars to establish gaze communication with the physical environment using a camera-behind-the-display system. The system uses a display that rapidly alternates between visible and transparent states. During the transparent state, a camera positioned behind the display captures the physical environment. This configuration physically aligns the position of the avatar's eyes with the camera, enabling two-way gaze communication with people and objects in the physical environment. Building on this system, we developed a framework for mutual gaze communication between avatars and people. The framework detects the user's gaze and dynamically synthesizes the avatar's gaze towards people or objects in the environment. This capability was integrated into an AI agent system to generate real-time, context-aware gaze behaviors during conversations, enabling more seamless and natural interactions. To evaluate the system, we conducted a user study to assess its effectiveness in supporting physical gaze awareness and generating human-like gaze behaviors. The results show that the behind-display approach significantly enhances the user's perception of being observed and attended to by the avatar. By bridging the gap between virtual avatars and the physical environment through enhanced gaze interactions, our system offers a promising avenue for more immersive and human-like AI-mediated communication in everyday environments.

AnimeGaze: Real-Time Mutual Gaze Synthesis for Anime-Style Avatars in Physical Environments via Behind-Display Camera

TL;DR

AnimeGaze addresses the lack of gaze engagement for on‑screen avatars by introducing a behind‑display camera and a transparent display to physically align the avatar's eyes with the camera, enabling real‑time mutual gaze with the physical environment. It formalizes gaze transmission as a cross‑space 3D–2D problem, employs a PnP‑based gaze target estimation with , , and to minimize , , and , and introduces a perception‑oriented calibration using to reduce the Mona Lisa effect. The system supports avatars with arbitrary eye configurations and includes a symbolic regression‑based calibration (4.2.1) to align perceived and actual gaze, validated by a user study showing improved eye contact and attentiveness. The work demonstrates a practical path toward immersive, human‑like AI interactions in everyday environments and lays groundwork for extending mutual gaze to non‑human avatars and broader gaze behaviors.

Abstract

Avatars on displays lack the ability to engage with the physical environment through gaze. To address this limitation, we propose a gaze synthesis method that enables animated avatars to establish gaze communication with the physical environment using a camera-behind-the-display system. The system uses a display that rapidly alternates between visible and transparent states. During the transparent state, a camera positioned behind the display captures the physical environment. This configuration physically aligns the position of the avatar's eyes with the camera, enabling two-way gaze communication with people and objects in the physical environment. Building on this system, we developed a framework for mutual gaze communication between avatars and people. The framework detects the user's gaze and dynamically synthesizes the avatar's gaze towards people or objects in the environment. This capability was integrated into an AI agent system to generate real-time, context-aware gaze behaviors during conversations, enabling more seamless and natural interactions. To evaluate the system, we conducted a user study to assess its effectiveness in supporting physical gaze awareness and generating human-like gaze behaviors. The results show that the behind-display approach significantly enhances the user's perception of being observed and attended to by the avatar. By bridging the gap between virtual avatars and the physical environment through enhanced gaze interactions, our system offers a promising avenue for more immersive and human-like AI-mediated communication in everyday environments.

Paper Structure

This paper contains 18 sections, 4 equations, 7 figures.

Figures (7)

  • Figure 1: The Gaze Interaction Space in Our Problem Statement. (a) When a virtual avatar gazes at a physical object over the display, (b) the avatar perceives the physical space as a two-dimensional plane from the camera, and (c) the user perceives the virtual space as a two-dimensional plane from the virtual camera.
  • Figure 2: (a) The Mona Lisa Effect in the era of the Mona Lisa, (b) The diversification of the Mona Lisa Effect in the modern era (quoted from Kiseiju, Volume 10).
  • Figure 3: Position of this paper. (a) The user can make eye contact with a virtual avatar in VR space. (b) The user can make eye contact with the virtual avatar through the screen. (c) This paper is a system in which avatars with various eye characteristics can communicate with each other in physical space.
  • Figure 4: Hardware Configuration
  • Figure 5: Gaze Interaction Space between User and Avatar. The avatar's line of sight and the camera's optical axis are aligned by the eye contact display, and the target physical object exists at one of the points of the avatar's gazing vector.
  • ...and 2 more figures