Table of Contents
Fetching ...

Enhancing Robot Assistive Behaviour with Reinforcement Learning and Theory of Mind

Antonio Andriella, Giovanni Falcone, Silvia Rossi

TL;DR

This paper addresses how to enhance human–robot collaboration by combining reinforcement learning with Theory of Mind (ToM). It introduces a two-layer architecture: a learning layer based on Q-learning learns high-level socially assistive actions, while a separate heuristic ToM mentalising layer infers the user’s intended strategies to implement the robot’s assistance and provide explanations. The approach is evaluated in a real-world study (N=56) comparing an adaptive robot with ToM capabilities to one without ToM, revealing that ToM enhances task performance, increases acceptance of assistance, and improves perceived robot competence in adapting, predicting, and recognizing user intents. The findings suggest that decoupling learning from execution and adding ToM-based explanations can yield more efficient and transparent human–robot interactions, with potential to generalise to other assistive domains.

Abstract

The adaptation to users' preferences and the ability to infer and interpret humans' beliefs and intents, which is known as the Theory of Mind (ToM), are two crucial aspects for achieving effective human-robot collaboration. Despite its importance, very few studies have investigated the impact of adaptive robots with ToM abilities. In this work, we present an exploratory comparative study to investigate how social robots equipped with ToM abilities impact users' performance and perception. We design a two-layer architecture. The Q-learning agent on the first layer learns the robot's higher-level behaviour. On the second layer, a heuristic-based ToM infers the user's intended strategy and is responsible for implementing the robot's assistance, as well as providing the motivation behind its choice. We conducted a user study in a real-world setting, involving 56 participants who interacted with either an adaptive robot capable of ToM, or with a robot lacking such abilities. Our findings suggest that participants in the ToM condition performed better, accepted the robot's assistance more often, and perceived its ability to adapt, predict and recognise their intents to a higher degree. Our preliminary insights could inform future research and pave the way for designing more complex computation architectures for adaptive behaviour with ToM capabilities.

Enhancing Robot Assistive Behaviour with Reinforcement Learning and Theory of Mind

TL;DR

This paper addresses how to enhance human–robot collaboration by combining reinforcement learning with Theory of Mind (ToM). It introduces a two-layer architecture: a learning layer based on Q-learning learns high-level socially assistive actions, while a separate heuristic ToM mentalising layer infers the user’s intended strategies to implement the robot’s assistance and provide explanations. The approach is evaluated in a real-world study (N=56) comparing an adaptive robot with ToM capabilities to one without ToM, revealing that ToM enhances task performance, increases acceptance of assistance, and improves perceived robot competence in adapting, predicting, and recognizing user intents. The findings suggest that decoupling learning from execution and adding ToM-based explanations can yield more efficient and transparent human–robot interactions, with potential to generalise to other assistive domains.

Abstract

The adaptation to users' preferences and the ability to infer and interpret humans' beliefs and intents, which is known as the Theory of Mind (ToM), are two crucial aspects for achieving effective human-robot collaboration. Despite its importance, very few studies have investigated the impact of adaptive robots with ToM abilities. In this work, we present an exploratory comparative study to investigate how social robots equipped with ToM abilities impact users' performance and perception. We design a two-layer architecture. The Q-learning agent on the first layer learns the robot's higher-level behaviour. On the second layer, a heuristic-based ToM infers the user's intended strategy and is responsible for implementing the robot's assistance, as well as providing the motivation behind its choice. We conducted a user study in a real-world setting, involving 56 participants who interacted with either an adaptive robot capable of ToM, or with a robot lacking such abilities. Our findings suggest that participants in the ToM condition performed better, accepted the robot's assistance more often, and perceived its ability to adapt, predict and recognise their intents to a higher degree. Our preliminary insights could inform future research and pave the way for designing more complex computation architectures for adaptive behaviour with ToM capabilities.

Paper Structure

This paper contains 24 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: A user playing the memory game with the assistance of the Furhat robot.
  • Figure 2: The two-layer architecture. Firstly, the robot selects the action of assistance that matches the user's performance ($a=sug\_card$) in a given state (${s\_begin, sug\_col, s\_wrong}$). It does so, in the learning layer, by accessing the q-table learnt offline by combining interactions generated in simulation and from previous data. Next in the mentalising layer, the heuristic-based ToM based on the user's previous moves, estimates the card that might lead to a match (e.g., shark) and then explains why that was selected.
  • Figure 3: The figure shows the results in simulation of a perfect player ($M=25.1$$SD=3.7$), an imperfect player ($M=48.15$, $SD=7.55$), and an imperfect player playing assisted by the Q-learning agent ($M=41.73$, $SD=8.83$), respectively.
  • Figure 4: The figure shows the Q-matrix learnt from the RL agent. On the y-axis, the four actions: note that $N\_A$ stands for $no\_action$, $S\_R$ stands for $sug\_row$, $S\_C$ stands for $sug\_col$, $S\_CC$ stands for $sug\_card$. On the x-axis, the states. Note that we did not include those states that the agent never visited.
  • Figure 5: The figure shows (a) the number of turns, (b) completion time, (c) time for a match, (d) the number of assistance received for the participants who belonged to the noToM group (left, blue) and the ToM (right, orange), respectively ($*$ denotes .01 $<$ p $<$ .05).
  • ...and 2 more figures