A Learning Algorithm That Attains the Human Optimum in a Repeated Human-Machine Interaction Game

Jason T. Isa; Lillian J. Ratliff; Samuel A. Burden

A Learning Algorithm That Attains the Human Optimum in a Repeated Human-Machine Interaction Game

Jason T. Isa, Lillian J. Ratliff, Samuel A. Burden

TL;DR

The paper addresses the challenge of aligning a learning-based agent with a human's cost in repeated human-machine interactions when the cost is unknown to the machine. It introduces a game-theoretic, observation-driven algorithm that converges to the human's cost minimum by updating estimates of the optimum under an affine policy, without solving an inverse problem. Extensive human-subject experiments across multiple action-dimension configurations, supported by simulations, demonstrate convergence to the minimum of a quadratic cost and agreement between theory and data. The work suggests significant potential for safe, adaptive assistive devices and broader human-robot interaction contexts, with future work aiming to generalize beyond quadratic costs and to real-world implementations.

Abstract

When humans interact with learning-based control systems, a common goal is to minimize a cost function known only to the human. For instance, an exoskeleton may adapt its assistance in an effort to minimize the human's metabolic cost-of-transport. Conventional approaches to synthesizing the learning algorithm solve an inverse problem to infer the human's cost. However, these problems can be ill-posed, hard to solve, or sensitive to problem data. Here we show a game-theoretic learning algorithm that works solely by observing human actions to find the cost minimum, avoiding the need to solve an inverse problem. We evaluate the performance of our algorithm in an extensive set of human subjects experiments, demonstrating consistent convergence to the minimum of a prescribed human cost function in scalar and multidimensional instantiations of the game. We conclude by outlining future directions for theoretical and empirical extensions of our results.

A Learning Algorithm That Attains the Human Optimum in a Repeated Human-Machine Interaction Game

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 5 figures, 4 algorithms)

This paper contains 15 sections, 3 equations, 5 figures, 4 algorithms.

Introduction
Machine's Learning Algorithm
Methods
Participant Population
Human Input
Experiment Initialization
Protocol
Data Collection
Simulations
Results
Convergence to $(h^*,m^*)$ in the $1\times1$ game
Convergence to $(h^*,m^*)$ in the $1\times2$, $2\times1$, and $2\times2$ games
Discussion
Conclusion
Additional Algorithms

Figures (5)

Figure 1: Human participant $H$ provides manual input $h$ to keep the black circle on a computer screen as small as possible while a learning algorithm $M$ determines input $m$. The radius of the circle represents the instantaneous value of a prescribed cost $c(h,m)$.
Figure 2: $1 \times 1$ Experiment ($n = 80$, $n = 10$ per initialization point): (a) $M$’s median estimate of $h^*$ and $m^*$ over iterations for each initialization point. (b) Distributions of L-1 error of $M$’s estimate of $h^*$; box-and-whiskers plot showing 5th, 25th, 50th, 75th, and 95th percentiles. (c) Distributions L-1 error of $M$’s estimate of $m^*$; box-and-whiskers plot showing same as (b). (d) Cost distributions; bar plots with quartiles. The total L-1 error distributions of $M$’s estimate of the optimum $(h^*,m^*)$ can be obtained by adding the L-1 errors from (b) and (c).
Figure 3: $1 \times 1$ Simulation vs Experiment ($n = 80$; $n=10$ per initialization point): Grey lines correspond to simulation data. (a) $M$’s median estimate of $h^*$ and $m^*$ over iterations for each initialization point. (b) Distributions of $M$’s estimate of $h^*$; box-and-whiskers plot showing 5th, 25th, 50th, 75th, and 95th percentiles. (c) Distributions $M$’s estimate of $m^*$; box-and-whiskers plot showing same as (b). (d) Cost distributions; bar plots with quartiles.
Figure 4: $1 \times 2$, $2 \times 1$, $2 \times 2$ Experiments ($n = 20$ each): (a) Distributions of L-1 error of $M$’s estimate of $h^*$; box-and-whiskers plot showing 5th, 25th, 50th, 75th, and 95th percentiles. (b) Distributions of L-1 error of $M$’s estimate of $m^*$; box-and-whiskers plot showing same as (a). (c) Cost distributions; bar plots with quartiles.
Figure : $1 \times 1$ Experiments

A Learning Algorithm That Attains the Human Optimum in a Repeated Human-Machine Interaction Game

TL;DR

Abstract

A Learning Algorithm That Attains the Human Optimum in a Repeated Human-Machine Interaction Game

Authors

TL;DR

Abstract

Table of Contents

Figures (5)