Table of Contents
Fetching ...

Does the Appearance of Autonomous Conversational Robots Affect User Spoken Behaviors in Real-World Conference Interactions?

Zi Haur Pang, Yahui Fu, Divesh Lala, Mikey Elmers, Koji Inoue, Tatsuya Kawahara

TL;DR

This study investigates how the appearance of autonomous conversational robots influences users' spoken behavior in real-world, conference-based interactions by comparing a highly human-like ERICA with a less anthropomorphic TELECO. Using transcripts from 42 participants and a broad set of NLP-derived linguistic, dialogue, emotion, and mimicry features, the authors find moderate effects: users produced more complex syntax and fewer disfluencies with ERICA, while TELECO elicited more disfluencies. A predictive modeling component demonstrates that Naïve Bayes best distinguishes robot human-likeness from speech features, with syntactic complexity and disfluency metrics driving performance; SHAP and permutation analyses highlight these features as key predictors. The work frames findings within Cognitive Load and Communication Accommodation Theory, suggesting robot design should target fluency and structured speech to improve communicative alignment, with implications for real-world HRI benchmarks and future work incorporating non-verbal cues and larger samples.

Abstract

We investigate the impact of robot appearance on users' spoken behavior during real-world interactions by comparing a human-like android, ERICA, with a less anthropomorphic humanoid, TELECO. Analyzing data from 42 participants at SIGDIAL 2024, we extracted linguistic features such as disfluencies and syntactic complexity from conversation transcripts. The results showed moderate effect sizes, suggesting that participants produced fewer disfluencies and employed more complex syntax when interacting with ERICA. Further analysis involving training classification models like Naïve Bayes, which achieved an F1-score of 71.60\%, and conducting feature importance analysis, highlighted the significant role of disfluencies and syntactic complexity in interactions with robots of varying human-like appearances. Discussing these findings within the frameworks of cognitive load and Communication Accommodation Theory, we conclude that designing robots to elicit more structured and fluent user speech can enhance their communicative alignment with humans.

Does the Appearance of Autonomous Conversational Robots Affect User Spoken Behaviors in Real-World Conference Interactions?

TL;DR

This study investigates how the appearance of autonomous conversational robots influences users' spoken behavior in real-world, conference-based interactions by comparing a highly human-like ERICA with a less anthropomorphic TELECO. Using transcripts from 42 participants and a broad set of NLP-derived linguistic, dialogue, emotion, and mimicry features, the authors find moderate effects: users produced more complex syntax and fewer disfluencies with ERICA, while TELECO elicited more disfluencies. A predictive modeling component demonstrates that Naïve Bayes best distinguishes robot human-likeness from speech features, with syntactic complexity and disfluency metrics driving performance; SHAP and permutation analyses highlight these features as key predictors. The work frames findings within Cognitive Load and Communication Accommodation Theory, suggesting robot design should target fluency and structured speech to improve communicative alignment, with implications for real-world HRI benchmarks and future work incorporating non-verbal cues and larger samples.

Abstract

We investigate the impact of robot appearance on users' spoken behavior during real-world interactions by comparing a human-like android, ERICA, with a less anthropomorphic humanoid, TELECO. Analyzing data from 42 participants at SIGDIAL 2024, we extracted linguistic features such as disfluencies and syntactic complexity from conversation transcripts. The results showed moderate effect sizes, suggesting that participants produced fewer disfluencies and employed more complex syntax when interacting with ERICA. Further analysis involving training classification models like Naïve Bayes, which achieved an F1-score of 71.60\%, and conducting feature importance analysis, highlighted the significant role of disfluencies and syntactic complexity in interactions with robots of varying human-like appearances. Discussing these findings within the frameworks of cognitive load and Communication Accommodation Theory, we conclude that designing robots to elicit more structured and fluent user speech can enhance their communicative alignment with humans.

Paper Structure

This paper contains 19 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Photo of interview dialogue with ERICA by SIGDIAL participant
  • Figure 2: Photo of interview dialogue with TELECO by SIGDIAL participant
  • Figure 3: Overall architecture of the interview system implemented in our study. This comprehensive system architecture includes modules for real-time automatic speech recognition (ASR), prosodic information extraction, language understanding, and user fluency adaptation, among others. Central to the system is the dialogue manager, which coordinates turn-taking, response generation, and conversation repair. Also included are the text-to-speech, gesture generation, and lip motion generation components, enhancing the robots' interactive capabilities.
  • Figure 4: Distribution of SHAP values for each behavioral feature. The SHAP values, which quantify the impact on the model output, are plotted along the x-axis against each feature on the y-axis. Each point represents an individual instance. Points in blue indicate feature values below the average, affecting the model negatively (left of the vertical zero line), while points in yellow denote values above the average, contributing positively to the prediction (right of the zero line).