Table of Contents
Fetching ...

Hedging and Approximate Truthfulness in Traditional Forecasting Competitions

Mary Monroe, Anish Thilagar, Melody Hsu, Rafael Frongillo

TL;DR

This paper analyzes the traditional Simple Max mechanism for forecasting contests across multiple events and formalizes incentive issues. Using a geometric view with the quadratic score $S(r,y)=1-(r-y)^2$, it shows that long-run truthfulness can fail: even a leading forecaster may benefit from hedging toward others’ reports. It then develops a positive result in a two-forecaster regime with sufficient uncertainty, proving approximate truthfulness via approximate affineness of the utility and Edgeworth expansions, yielding a rate $\gamma=O(m^{-1/4})$ for the distance to truth. The results have practical implications for leaderboard design and suggest avenues such as truncated/soft-max alternatives to mitigate extremizing incentives, along with directions for extending the theory to more general settings.

Abstract

In forecasting competitions, the traditional mechanism scores the predictions of each contestant against the outcome of each event, and the contestant with the highest total score wins. While it is well-known that this traditional mechanism can suffer from incentive issues, it is folklore that contestants will still be roughly truthful as the number of events grows. Yet thus far the literature lacks a formal analysis of this traditional mechanism. This paper gives the first such analysis. We first demonstrate that the ''long-run truthfulness'' folklore is false: even for arbitrary numbers of events, the best forecaster can have an incentive to hedge, reporting more moderate beliefs to increase their win probability. On the positive side, however, we show that two contestants will be approximately truthful when they have sufficient uncertainty over the relative quality of their opponent and the outcomes of the events, a case which may arise in practice.

Hedging and Approximate Truthfulness in Traditional Forecasting Competitions

TL;DR

This paper analyzes the traditional Simple Max mechanism for forecasting contests across multiple events and formalizes incentive issues. Using a geometric view with the quadratic score , it shows that long-run truthfulness can fail: even a leading forecaster may benefit from hedging toward others’ reports. It then develops a positive result in a two-forecaster regime with sufficient uncertainty, proving approximate truthfulness via approximate affineness of the utility and Edgeworth expansions, yielding a rate for the distance to truth. The results have practical implications for leaderboard design and suggest avenues such as truncated/soft-max alternatives to mitigate extremizing incentives, along with directions for extending the theory to more general settings.

Abstract

In forecasting competitions, the traditional mechanism scores the predictions of each contestant against the outcome of each event, and the contestant with the highest total score wins. While it is well-known that this traditional mechanism can suffer from incentive issues, it is folklore that contestants will still be roughly truthful as the number of events grows. Yet thus far the literature lacks a formal analysis of this traditional mechanism. This paper gives the first such analysis. We first demonstrate that the ''long-run truthfulness'' folklore is false: even for arbitrary numbers of events, the best forecaster can have an incentive to hedge, reporting more moderate beliefs to increase their win probability. On the positive side, however, we show that two contestants will be approximately truthful when they have sufficient uncertainty over the relative quality of their opponent and the outcomes of the events, a case which may arise in practice.
Paper Structure (30 sections, 12 theorems, 39 equations, 4 figures)

This paper contains 30 sections, 12 theorems, 39 equations, 4 figures.

Key Result

Lemma 1

Suppose $m$, $p$ and $\epsilon$ satisfy Condition cond:hedging-bounds. Then for any $\boldsymbol{y} \in \mathcal{Y}$ with $\|\boldsymbol{y}\|_1 \leq p^* m$, if $\|\boldsymbol{r}_j - \boldsymbol{c}\|_2 < \epsilon \sqrt{m}$ then $d^*(\boldsymbol{y}) < d_j(\boldsymbol{y}) - 2$.

Figures (4)

  • Figure 1: The score distributions of two (relatively) good and bad forecasters. Their chance of winning is proportional to the region where their distributions overlap. On the left, the bad forecaster can extremize to the dashed: despite lowering her mean, this increases her variance, and thus her win share. On the right, the good forecaster can similarly increase her winshare by hedging to the dashed distribution, which despite lowering her mean, also lowers her variance. Intuitively, the reason the good forecaster benefits hedging is that it decreases the variance of their score, "locking in" their lead, even while decreasing their expected score. To draw a familiar analogy from sports: a team which is behind will start making long-shot attempts to score, increasing their variance, while the team which is ahead will try slow down the game, decreasing their variance as well as their chance of scoring more.
  • Figure 2: The equilibrium strategies of $i$ and $j$ for $m=n=2$ and $p \in (1/3, 1/2)$. The blue circles denote points $i$ plays and the red squares denote $j$'s. Notably both players play in the center $\boldsymbol{c}$ some of the time but $i$ always plays bounded away from their belief $\boldsymbol{p}$ (orange point).
  • Figure 3: The geometry of the plane containing $\boldsymbol{p}$, $\boldsymbol{c}$, $\boldsymbol{r}^*$ and some $\boldsymbol{y}$. The bottom line is the main diagonal of the $m$-dimensional hypercube that goes from $\{0\}^m$ to $\{1\}^m$. Note that $\boldsymbol{r}^*$ is the midpoint of $i$'s belief $\boldsymbol{p}$ and the belief of the other forecasters $\boldsymbol{c}$, so $i$ is moving to the middle of the "information gap." The radius of the balls around $\boldsymbol{p}$ and $\boldsymbol{c}$ are chosen so that they are bounded away from the ball of radius $d^*(\boldsymbol{y})$ around a $\boldsymbol{y}$ that lies perpendicular to the diagonal at $\boldsymbol{r}^*$ (red). Lemma \ref{['lem:counterexample-lower-ineq']} considers $\|\boldsymbol{y}\|_1 \leq p^* m$ (blue), while Lemma \ref{['lem:counterexample-upper-ineq']} considers $\|\boldsymbol{y}\|_1 \geq p^* m$ (orange). The high level idea is that forecaster $i$ can shift from winning roughly just the $\boldsymbol{y}$ to the left of $\boldsymbol{r}^*$ when reporting $\boldsymbol{r}_i$, to additionally winning some fraction of the $\boldsymbol{y}$ between $\boldsymbol{r}^*$ and $\boldsymbol{c}$ by hedging to $\boldsymbol{r}^*$.
  • Figure 4: Two examples of Normal CDFs approximating the expected utility $G_{it}$. If the score difference at event $t$ (in $[-1, 1]$) occurs at (a) or (c), approximate affineness ceases to hold as the CDF flattens out. We observe small error of the affine approximation around (b), the mean; thus we want its magnitude to be small. Note also the area where the CDF resembles an affine function is much narrower for the distribution with lower variance (red).

Theorems & Definitions (26)

  • Definition 1: Strategy
  • Definition 2: Simple Max
  • Definition 3: Approximately truthful
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Theorem 1
  • ...and 16 more