Appropriateness of LLM-equipped Robotic Well-being Coach Language in the Workplace: A Qualitative Evaluation
Micol Spitale, Minja Axelsson, Hatice Gunes
TL;DR
The paper addresses the challenge of making LLM-generated language in robotic well-being coaches appropriate for workplace use. It adopts a qualitative, multi-method approach, deploying an LLM-enabled coach to 17 employees over four weeks, followed by interviews and a 1.5-hour focus group to solicit evaluations across seven scenario-based prompts. The key findings show that language should probe deep feelings, demonstrate empathy, and avoid premature assumptions to prevent bias and stereotyping, informing practical design guidelines for real-world deployment. These insights advance how robotic coaches can support workplace mental well-being with safer, more context-aware language.
Abstract
Robotic coaches have been recently investigated to promote mental well-being in various contexts such as workplaces and homes. With the widespread use of Large Language Models (LLMs), HRI researchers are called to consider language appropriateness when using such generated language for robotic mental well-being coaches in the real world. Therefore, this paper presents the first work that investigated the language appropriateness of robot mental well-being coach in the workplace. To this end, we conducted an empirical study that involved 17 employees who interacted over 4 weeks with a robotic mental well-being coach equipped with LLM-based capabilities. After the study, we individually interviewed them and we conducted a focus group of 1.5 hours with 11 of them. The focus group consisted of: i) an ice-breaking activity, ii) evaluation of robotic coach language appropriateness in various scenarios, and iii) listing shoulds and shouldn'ts for designing appropriate robotic coach language for mental well-being. From our qualitative evaluation, we found that a language-appropriate robotic coach should (1) ask deep questions which explore feelings of the coachees, rather than superficial questions, (2) express and show emotional and empathic understanding of the context, and (3) not make any assumptions without clarifying with follow-up questions to avoid bias and stereotyping. These results can inform the design of language-appropriate robotic coach to promote mental well-being in real-world contexts.
