Table of Contents
Fetching ...

Xpress: A System For Dynamic, Context-Aware Robot Facial Expressions using Language Models

Victor Nikhil Antony, Maia Stiber, Chien-Ming Huang

TL;DR

Xpress presents a three-phase, LM-driven pipeline that generates context-aware robotic facial expressions as a temporal trajectory $\tau_{face}$ conditioned on interaction content and socio-emotional context $\{z_T,z_C\}$. By coupling a flexible 28-DoF face with LM-based code generation, the approach enables dynamic, expressive, and contextually aligned facial behavior across storytelling and real-time conversation. Empirical evaluations show high executable-code rates (≈100%), substantial context alignment (≈71–75%), and strong expressiveness (≈3.6–3.9 on a 5-point scale), with notable variation in perceived intensity and some latency considerations in real-time deployment. The work demonstrates the practical potential of LM-assisted expression generation in social robotics while outlining key extensions to multi-modal cues, co-creation, and real-time performance.

Abstract

Facial expressions are vital in human communication and significantly influence outcomes in human-robot interaction (HRI), such as likeability, trust, and companionship. However, current methods for generating robotic facial expressions are often labor-intensive, lack adaptability across contexts and platforms, and have limited expressive ranges--leading to repetitive behaviors that reduce interaction quality, particularly in long-term scenarios. We introduce Xpress, a system that leverages language models (LMs) to dynamically generate context-aware facial expressions for robots through a three-phase process: encoding temporal flow, conditioning expressions on context, and generating facial expression code. We demonstrated Xpress as a proof-of-concept through two user studies (n=15x2) and a case study with children and parents (n=13), in storytelling and conversational scenarios to assess the system's context-awareness, expressiveness, and dynamism. Results demonstrate Xpress's ability to dynamically produce expressive and contextually appropriate facial expressions, highlighting its versatility and potential in HRI applications.

Xpress: A System For Dynamic, Context-Aware Robot Facial Expressions using Language Models

TL;DR

Xpress presents a three-phase, LM-driven pipeline that generates context-aware robotic facial expressions as a temporal trajectory conditioned on interaction content and socio-emotional context . By coupling a flexible 28-DoF face with LM-based code generation, the approach enables dynamic, expressive, and contextually aligned facial behavior across storytelling and real-time conversation. Empirical evaluations show high executable-code rates (≈100%), substantial context alignment (≈71–75%), and strong expressiveness (≈3.6–3.9 on a 5-point scale), with notable variation in perceived intensity and some latency considerations in real-time deployment. The work demonstrates the practical potential of LM-assisted expression generation in social robotics while outlining key extensions to multi-modal cues, co-creation, and real-time performance.

Abstract

Facial expressions are vital in human communication and significantly influence outcomes in human-robot interaction (HRI), such as likeability, trust, and companionship. However, current methods for generating robotic facial expressions are often labor-intensive, lack adaptability across contexts and platforms, and have limited expressive ranges--leading to repetitive behaviors that reduce interaction quality, particularly in long-term scenarios. We introduce Xpress, a system that leverages language models (LMs) to dynamically generate context-aware facial expressions for robots through a three-phase process: encoding temporal flow, conditioning expressions on context, and generating facial expression code. We demonstrated Xpress as a proof-of-concept through two user studies (n=15x2) and a case study with children and parents (n=13), in storytelling and conversational scenarios to assess the system's context-awareness, expressiveness, and dynamism. Results demonstrate Xpress's ability to dynamically produce expressive and contextually appropriate facial expressions, highlighting its versatility and potential in HRI applications.

Paper Structure

This paper contains 39 sections, 7 figures.

Figures (7)

  • Figure 1: Xpress pipeline for storytelling content generation.
  • Figure 2: Example delivery of story generated using Xpress showing robot faces and the corresponding story text.
  • Figure 3: Participants' perception of storytelling and conversational systems' faces. Cross is mean; box shows quartiles.
  • Figure 4: Children watched and evaluated our robot narrating stories generated using Xpress.
  • Figure 5: Pipeline for pre-generation of expression bank.
  • ...and 2 more figures