Table of Contents
Fetching ...

Fatigue-Aware Learning to Defer via Constrained Optimisation

Zheng Zhang, Cuong C. Nguyen, David Rosewarne, Kevin Wells, Gustavo Carneiro

Abstract

Learning to defer (L2D) enables human-AI cooperation by deciding when an AI system should act autonomously or defer to a human expert. Existing L2D methods, however, assume static human performance, contradicting well-established findings on fatigue-induced degradation. We propose Fatigue-Aware Learning to Defer via Constrained Optimisation (FALCON), which explicitly models workload-varying human performance using psychologically grounded fatigue curves. FALCON formulates L2D as a Constrained Markov Decision Process (CMDP) whose state includes both task features and cumulative human workload, and optimises accuracy under human-AI cooperation budgets via PPO-Lagrangian training. We further introduce FA-L2D, a benchmark that systematically varies fatigue dynamics from near-static to rapidly degrading regimes. Experiments across multiple datasets show that FALCON consistently outperforms state-of-the-art L2D methods across coverage levels, generalises zero-shot to unseen experts with different fatigue patterns, and demonstrates the advantage of adaptive human-AI collaboration over AI-only or human-only decision-making when coverage lies strictly between 0 and 1.

Fatigue-Aware Learning to Defer via Constrained Optimisation

Abstract

Learning to defer (L2D) enables human-AI cooperation by deciding when an AI system should act autonomously or defer to a human expert. Existing L2D methods, however, assume static human performance, contradicting well-established findings on fatigue-induced degradation. We propose Fatigue-Aware Learning to Defer via Constrained Optimisation (FALCON), which explicitly models workload-varying human performance using psychologically grounded fatigue curves. FALCON formulates L2D as a Constrained Markov Decision Process (CMDP) whose state includes both task features and cumulative human workload, and optimises accuracy under human-AI cooperation budgets via PPO-Lagrangian training. We further introduce FA-L2D, a benchmark that systematically varies fatigue dynamics from near-static to rapidly degrading regimes. Experiments across multiple datasets show that FALCON consistently outperforms state-of-the-art L2D methods across coverage levels, generalises zero-shot to unseen experts with different fatigue patterns, and demonstrates the advantage of adaptive human-AI collaboration over AI-only or human-only decision-making when coverage lies strictly between 0 and 1.

Paper Structure

This paper contains 20 sections, 8 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Example of an L2D scenario illustrating workload-variant human performance in human–AI task allocation within a single episode. FALCON adapts deferral decisions based on both task difficulty and accumulated human fatigue. At $t=1$, an easy task is handled by the AI while the human expert remains fresh. At $t=2$, a challenging case is deferred to the human expert who has sufficient cognitive capacity. By $t=3$, another hard task is still assigned to the human despite mild fatigue accumulation. At the final time step $t=T$, severe human fatigue leads to AI handling the task to prevent performance degradation.
  • Figure 2: (a): Examples of $\mathsf{w}(\rho)$. The values of parameters $(w_{0}, w_\text{peak}, w_\text{base}, k, \bar{\rho}, \hat{\rho})$ in Example 1,2 and 3 are $(0.9, 1, 0.7, 0.1, 0.375, 0.05)$, $(0.8, 0.95, 0.5, 0.09, 0.5, 0.025)$ and $(0.8, 0.9, 0.6, 0.2, 0.6, 0.1)$. (b): The architecture of FALCON with workload-variant human performance. A backbone model extracts visual features from the input $\mathbf{x}_t$, while the cumulative human workload $\rho_t$ is passed through an embedding layer. The visual and workload features are concatenated and processed by a Resettable S5 layers lu2023structured to capture temporal dependencies and output the policy $\pi(\mathbf{a}_t|\mathbf{s}_t)$ alongside value estimates.
  • Figure 3: Human performance-Cumulative Workload curves on various datasets. The blue and red lines denote the upper and lower bound of human performance under cumulative workload accumulation.
  • Figure 4: Training time of FALCON and competing methods on Cifar100 (1e7 iterations).
  • Figure 5: Inference time of FALCON and competing methods on Cifar100 (50 episodes).
  • ...and 4 more figures

Theorems & Definitions (1)

  • Remark 5.1