Enhancing Robot Assistive Behaviour with Reinforcement Learning and Theory of Mind
Antonio Andriella, Giovanni Falcone, Silvia Rossi
TL;DR
This paper addresses how to enhance human–robot collaboration by combining reinforcement learning with Theory of Mind (ToM). It introduces a two-layer architecture: a learning layer based on Q-learning learns high-level socially assistive actions, while a separate heuristic ToM mentalising layer infers the user’s intended strategies to implement the robot’s assistance and provide explanations. The approach is evaluated in a real-world study (N=56) comparing an adaptive robot with ToM capabilities to one without ToM, revealing that ToM enhances task performance, increases acceptance of assistance, and improves perceived robot competence in adapting, predicting, and recognizing user intents. The findings suggest that decoupling learning from execution and adding ToM-based explanations can yield more efficient and transparent human–robot interactions, with potential to generalise to other assistive domains.
Abstract
The adaptation to users' preferences and the ability to infer and interpret humans' beliefs and intents, which is known as the Theory of Mind (ToM), are two crucial aspects for achieving effective human-robot collaboration. Despite its importance, very few studies have investigated the impact of adaptive robots with ToM abilities. In this work, we present an exploratory comparative study to investigate how social robots equipped with ToM abilities impact users' performance and perception. We design a two-layer architecture. The Q-learning agent on the first layer learns the robot's higher-level behaviour. On the second layer, a heuristic-based ToM infers the user's intended strategy and is responsible for implementing the robot's assistance, as well as providing the motivation behind its choice. We conducted a user study in a real-world setting, involving 56 participants who interacted with either an adaptive robot capable of ToM, or with a robot lacking such abilities. Our findings suggest that participants in the ToM condition performed better, accepted the robot's assistance more often, and perceived its ability to adapt, predict and recognise their intents to a higher degree. Our preliminary insights could inform future research and pave the way for designing more complex computation architectures for adaptive behaviour with ToM capabilities.
