Table of Contents
Fetching ...

How to Evaluate Behavioral Models

Greg d'Eon, Sophie Greenwood, Kevin Leyton-Brown, James R. Wright

TL;DR

The authors address the problem of how to evaluate predictive behavioral models with loss functions. They introduce an axiomatic framework separating alignment (model comparison within data) from interpretability (how scores relate to data), and prove that diagonal bounded Bregman divergences (DBBD) satisfy these axioms, with squared L2 as a natural incumbent. By systematically evaluating common losses (error rate, MAE, NLL, cross-entropy, Brier, KL, scoring rules), they show that many widely used losses violate essential axioms, while DBBDs provide consistent, interpretable evaluation. This yields a principled recommendation to use DBBDs, especially squared L2, for evaluating behavioral models, with implications for broader domains involving discrete distributions, multiple samples, and interpretable model constraints. The work offers a rigorous foundation for loss selection in behavioral economics, psychology, and related fields, guiding future methodological choices and cross-domain applications.

Abstract

Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choices. We attempt to offer a principled answer to the question of which loss functions should be used for this task, formalizing axioms that we argue loss functions should satisfy. We construct a family of loss functions, which we dub "diagonal bounded Bregman divergences", that satisfy all of these axioms. These rule out many loss functions used in practice, but notably include squared L2 error; we thus recommend its use for evaluating behavioral models.

How to Evaluate Behavioral Models

TL;DR

The authors address the problem of how to evaluate predictive behavioral models with loss functions. They introduce an axiomatic framework separating alignment (model comparison within data) from interpretability (how scores relate to data), and prove that diagonal bounded Bregman divergences (DBBD) satisfy these axioms, with squared L2 as a natural incumbent. By systematically evaluating common losses (error rate, MAE, NLL, cross-entropy, Brier, KL, scoring rules), they show that many widely used losses violate essential axioms, while DBBDs provide consistent, interpretable evaluation. This yields a principled recommendation to use DBBDs, especially squared L2, for evaluating behavioral models, with implications for broader domains involving discrete distributions, multiple samples, and interpretable model constraints. The work offers a rigorous foundation for loss selection in behavioral economics, psychology, and related fields, guiding future methodological choices and cross-domain applications.

Abstract

Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choices. We attempt to offer a principled answer to the question of which loss functions should be used for this task, formalizing axioms that we argue loss functions should satisfy. We construct a family of loss functions, which we dub "diagonal bounded Bregman divergences", that satisfy all of these axioms. These rule out many loss functions used in practice, but notably include squared L2 error; we thus recommend its use for evaluating behavioral models.
Paper Structure (22 sections, 11 theorems, 53 equations, 1 figure, 1 table)

This paper contains 22 sections, 11 theorems, 53 equations, 1 figure, 1 table.

Key Result

theorem 1

For any $n$, under mild technical conditions, a loss function $L$ that satisfies DP must be of the form for some closed and proper strictly convex function $B$, subgradient $dB$ of $B$, translation $c: A^n \to \mathds{R}$, and summary statistic $\rho: A^n \to \Delta(A)$, where $\mathop{\mathds{E}}_{y\sim p^n} \rho(y) = p$ for all $p$.

Figures (1)

  • Figure 1: The losses of four predictions on two traveler's dilemma games Goeree2001.

Theorems & Definitions (26)

  • definition 1
  • definition 2: Pareto improvement
  • definition 3
  • definition 4
  • theorem 1: Corollary of Theorem 11 of Abernethy2012, informal
  • theorem 2: Informal
  • definition 5: Diagonal bounded Bregman divergence (DBBD)
  • theorem 3
  • proposition 1
  • theorem 4
  • ...and 16 more