Second-order Theory of Mind for Human Teachers and Robot Learners
Patrick Callaghan, Reid Simmons, Henny Admoni
TL;DR
This paper addresses the problem of misleading feedback in human-robot teaching that inflates cognitive burden. It proposes a Second-order Theory of Mind (ToM-2) realized within an Interactive Partially Observable Markov Decision Process (I-POMDP) to model perceived rationality and the teacher's beliefs about the learner and the learning objective. It introduces a discrete, learnable set of observation functions parameterized by $\beta$ to capture varying degrees of rationality, and Confidence Expressions (CEs) to convey the learner's certainty about features. The evaluation plan includes both simulation experiments and a human user study in a turn-based card-rule domain to assess reductions in rounds to learn, misbelief rates, and cognitive workload. The work aims to improve teaching efficacy by enabling feedback that accounts for the teacher's beliefs about the learner and the objective, with potential practical impact on human-robot teaching interactions.
Abstract
Confusing or otherwise unhelpful learner feedback creates or perpetuates erroneous beliefs that the teacher and learner have of each other, thereby increasing the cognitive burden placed upon the human teacher. For example, the robot's feedback might cause the human to misunderstand what the learner knows about the learning objective or how the learner learns. At the same time -- and in addition to the learning objective -- the learner might misunderstand how the teacher perceives the learner's task knowledge and learning processes. To ease the teaching burden, the learner should provide feedback that accounts for these misunderstandings and elicits efficient teaching from the human. This work endows an AI learner with a Second-order Theory of Mind that models perceived rationality as a source for the erroneous beliefs a teacher and learner may have of one another. It also explores how a learner can ease the teaching burden and improve teacher efficacy if it selects feedback which accounts for its model of the teacher's beliefs about the learner and its learning objective.
