Boltzmann State-Dependent Rationality

Osher Lerner

Boltzmann State-Dependent Rationality

Osher Lerner

TL;DR

This work addresses the gap in modeling human suboptimality for human-robot collaboration by replacing a single suboptimality parameter $β$ in Boltzmann Rationality with a state-dependent function $β(s)$, enabling a more expressive yet tractable representation of human behavior. It develops forward and inverse models under this state-aware framework, culminating in a maximum-a-posteriori objective for jointly inferring reward weights and state-dependent suboptimality across multiple agents. The authors outline experimental designs in GridWorld and Overcooked with Mechanical Turk data, demonstrating parameter recovery and the ability to generalize learned rewards and suboptimality to goal inference tasks. The proposed approach has practical implications for risk-aware planning, adaptive assistance, and improved human-centered collaboration in complex tasks, where conditional uncertainty and context-sensitive proficiency are essential.

Abstract

This paper expands on existing learned models of human behavior via a measured step in structured irrationality. Specifically, by replacing the suboptimality constant $β$ in a Boltzmann rationality model with a function over states $β(s)$, we gain natural expressivity in a computationally tractable manner. This paper discusses relevant mathematical theory, sets up several experimental designs, presents limited preliminary results, and proposes future investigations.

Boltzmann State-Dependent Rationality

TL;DR

This work addresses the gap in modeling human suboptimality for human-robot collaboration by replacing a single suboptimality parameter

in Boltzmann Rationality with a state-dependent function

, enabling a more expressive yet tractable representation of human behavior. It develops forward and inverse models under this state-aware framework, culminating in a maximum-a-posteriori objective for jointly inferring reward weights and state-dependent suboptimality across multiple agents. The authors outline experimental designs in GridWorld and Overcooked with Mechanical Turk data, demonstrating parameter recovery and the ability to generalize learned rewards and suboptimality to goal inference tasks. The proposed approach has practical implications for risk-aware planning, adaptive assistance, and improved human-centered collaboration in complex tasks, where conditional uncertainty and context-sensitive proficiency are essential.

Abstract

This paper expands on existing learned models of human behavior via a measured step in structured irrationality. Specifically, by replacing the suboptimality constant

in a Boltzmann rationality model with a function over states

, we gain natural expressivity in a computationally tractable manner. This paper discusses relevant mathematical theory, sets up several experimental designs, presents limited preliminary results, and proposes future investigations.

Paper Structure (19 sections, 19 equations, 1 figure)

This paper contains 19 sections, 19 equations, 1 figure.

Introduction
Background
Models of Human Proficiency
Inverse RL
Boltzmann Rationality
Theory
Why States?
Forward Model
Inverse Model
Experiments
Environments
Parameter recovery
Shared Goals
Generalization of Learned Rewards
Action Prediction
...and 4 more sections

Figures (1)

Figure 1: Posterior Belief of $\theta$ parameters from trajectories generated by $\theta_R = [0, 1]$ and $\theta_\beta = [1, 100]$. For this arbitrary discrete space, we are able to accurately recover our parameters.

Boltzmann State-Dependent Rationality

TL;DR

Abstract

Boltzmann State-Dependent Rationality

Authors

TL;DR

Abstract

Table of Contents

Figures (1)