Table of Contents
Fetching ...

Boltzmann State-Dependent Rationality

Osher Lerner

TL;DR

This work addresses the gap in modeling human suboptimality for human-robot collaboration by replacing a single suboptimality parameter $β$ in Boltzmann Rationality with a state-dependent function $β(s)$, enabling a more expressive yet tractable representation of human behavior. It develops forward and inverse models under this state-aware framework, culminating in a maximum-a-posteriori objective for jointly inferring reward weights and state-dependent suboptimality across multiple agents. The authors outline experimental designs in GridWorld and Overcooked with Mechanical Turk data, demonstrating parameter recovery and the ability to generalize learned rewards and suboptimality to goal inference tasks. The proposed approach has practical implications for risk-aware planning, adaptive assistance, and improved human-centered collaboration in complex tasks, where conditional uncertainty and context-sensitive proficiency are essential.

Abstract

This paper expands on existing learned models of human behavior via a measured step in structured irrationality. Specifically, by replacing the suboptimality constant $β$ in a Boltzmann rationality model with a function over states $β(s)$, we gain natural expressivity in a computationally tractable manner. This paper discusses relevant mathematical theory, sets up several experimental designs, presents limited preliminary results, and proposes future investigations.

Boltzmann State-Dependent Rationality

TL;DR

This work addresses the gap in modeling human suboptimality for human-robot collaboration by replacing a single suboptimality parameter in Boltzmann Rationality with a state-dependent function , enabling a more expressive yet tractable representation of human behavior. It develops forward and inverse models under this state-aware framework, culminating in a maximum-a-posteriori objective for jointly inferring reward weights and state-dependent suboptimality across multiple agents. The authors outline experimental designs in GridWorld and Overcooked with Mechanical Turk data, demonstrating parameter recovery and the ability to generalize learned rewards and suboptimality to goal inference tasks. The proposed approach has practical implications for risk-aware planning, adaptive assistance, and improved human-centered collaboration in complex tasks, where conditional uncertainty and context-sensitive proficiency are essential.

Abstract

This paper expands on existing learned models of human behavior via a measured step in structured irrationality. Specifically, by replacing the suboptimality constant in a Boltzmann rationality model with a function over states , we gain natural expressivity in a computationally tractable manner. This paper discusses relevant mathematical theory, sets up several experimental designs, presents limited preliminary results, and proposes future investigations.
Paper Structure (19 sections, 19 equations, 1 figure)

This paper contains 19 sections, 19 equations, 1 figure.

Figures (1)

  • Figure 1: Posterior Belief of $\theta$ parameters from trajectories generated by $\theta_R = [0, 1]$ and $\theta_\beta = [1, 100]$. For this arbitrary discrete space, we are able to accurately recover our parameters.