Table of Contents
Fetching ...

Task load dependent decision referrals for joint binary classification in human-automation teams

Kesav Kaza, Jerome Le Ny, Aditya Mahajan

TL;DR

This work addresses optimal task referrals in human-automation teams performing binary classification by modeling the human operator's performance as a function of task load and deriving a referral policy that leverages observed automation data. The authors introduce a referral index $R(p,w)$ and prove that, for a given workload $w$, referring the top-$w$ tasks by this index minimizes the total expected cost; they then search over feasible loads to obtain the overall policy. The framework is validated through simulations with Gaussian observation models and an experimental study using a radar-like task, showing that the proposed optimal allocation (OA) policy outperforms a blind allocation (BA) baseline and performs on par with a static allocation (SA) when load variability is a concern. The results highlight the practical viability of load-aware task referrals, with the ability to estimate human performance functions from calibration experiments, enabling real-world deployment in joint human-automation decision systems.

Abstract

We consider the problem of optimal decision referrals in human-automation teams performing binary classification tasks. The automation, which includes a pre-trained classifier, observes data for a batch of independent tasks, analyzes them, and may refer a subset of tasks to a human operator for fresh and final analysis. Our key modeling assumption is that human performance degrades with task load. We model the problem of choosing which tasks to refer as a stochastic optimization problem and show that, for a given task load, it is optimal to myopically refer tasks that yield the largest reduction in expected cost, conditional on the observed data. This provides a ranking scheme and a policy to determine the optimal set of tasks for referral. We evaluate this policy against a baseline through an experimental study with human participants. Using a radar screen simulator, participants made binary target classification decisions under time constraint. They were guided by a decision rule provided to them, but were still prone to errors under time pressure. An initial experiment estimated human performance model parameters, while a second experiment compared two referral policies. Results show statistically significant gains for the proposed optimal referral policy over a blind policy that determines referrals using the automation and human-performance models but not based on the observed data.

Task load dependent decision referrals for joint binary classification in human-automation teams

TL;DR

This work addresses optimal task referrals in human-automation teams performing binary classification by modeling the human operator's performance as a function of task load and deriving a referral policy that leverages observed automation data. The authors introduce a referral index and prove that, for a given workload , referring the top- tasks by this index minimizes the total expected cost; they then search over feasible loads to obtain the overall policy. The framework is validated through simulations with Gaussian observation models and an experimental study using a radar-like task, showing that the proposed optimal allocation (OA) policy outperforms a blind allocation (BA) baseline and performs on par with a static allocation (SA) when load variability is a concern. The results highlight the practical viability of load-aware task referrals, with the ability to estimate human performance functions from calibration experiments, enabling real-world deployment in joint human-automation decision systems.

Abstract

We consider the problem of optimal decision referrals in human-automation teams performing binary classification tasks. The automation, which includes a pre-trained classifier, observes data for a batch of independent tasks, analyzes them, and may refer a subset of tasks to a human operator for fresh and final analysis. Our key modeling assumption is that human performance degrades with task load. We model the problem of choosing which tasks to refer as a stochastic optimization problem and show that, for a given task load, it is optimal to myopically refer tasks that yield the largest reduction in expected cost, conditional on the observed data. This provides a ranking scheme and a policy to determine the optimal set of tasks for referral. We evaluate this policy against a baseline through an experimental study with human participants. Using a radar screen simulator, participants made binary target classification decisions under time constraint. They were guided by a decision rule provided to them, but were still prone to errors under time pressure. An initial experiment estimated human performance model parameters, while a second experiment compared two referral policies. Results show statistically significant gains for the proposed optimal referral policy over a blind policy that determines referrals using the automation and human-performance models but not based on the observed data.

Paper Structure

This paper contains 20 sections, 2 theorems, 25 equations, 6 figures, 1 algorithm.

Key Result

Theorem 1

To minimize $J(\mathcal{N},\{D_k\}_{k\in\mathcal{K}\setminus \mathcal{N}}, p^{a}_{1:K})$ in eq:team_cost for a given task load $w = \vert\mathcal{N}\vert$ and posteriors $p^a_{1:K}$, it is optimal for the automation to refer the $w$ tasks with largest referral indices among $\{R(p^a_{k},w)\}_{k=1}^K

Figures (6)

  • Figure 1: Illustration of the classification costs of the automation, $\Gamma^{a}_*(p)$, and the human, $\Gamma^{h}(p,w)$, for different task loads as a function of the posterior belief about a task. The referral index is the difference between these two curves.
  • Figure 2: Comparison of OA, BA, and SA for $25$ randomly sampled problem instances. For each instance, the figures show summary statistics when the experiment is repeated $2000$ times.
  • Figure 3: Experimental setup: A simulation of a radar screen showing multiple mobile objects. Clicking an object displays information such as speed, altitude, etc, on the central pane. The operator tries to follow the decision tree to decide the correct label (hostile or non-hostile) and presses one of the two classification buttons.
  • Figure 4: Experiment 1: [left] TPR and FPR measurements for the valid $18$ participants (who completed at least $55\%$ of all allocated tasks) for four task loads $6, 9, 12, 15$, with inter-quartile (IQR) ranges, along with the outliers represented as circles, and linear interpolation of TPR and FPR for other task loads. [right] Comparison of the empirical estimates at the four measured task loads against simulation based estimates made prior to the experiment.
  • Figure 5: [left] The decision tree used by the automation to make classification decisions. The referral decisions are made based on posterior probabilities of the classification tasks, computed using the frequency of visiting each of the leaves in the decision tree. [right] Classification decision cost as a function of the posterior probability $p^a_k$ for the experimental scenario, based on the empirically estimated TPR and FPR. The blue dots represent the average posterior probabilities of tasks corresponding to each of the six leaves in the decision tree of the automation.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Example 1: Gaussian Observation Models
  • Theorem 1
  • Theorem 2