Table of Contents
Fetching ...

Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Nam Anh Le

Abstract

Prediction markets are increasingly used as probability forecasting tools, yet their usefulness depends on calibration, specifically whether a contract trading at 70 cents truly implies a 70% probability. Using 292 million trades across 327,000 binary contracts on Kalshi and Polymarket, this paper shows that calibration is a structured, multidimensional phenomenon. On Kalshi, calibration decomposes into four components (a universal horizon effect, domain-specific biases, domain-by-horizon interactions and a trade-size scale effect) that together explain 87.3% of calibration variance. The dominant pattern is persistent underconfidence in political markets, where prices are chronically compressed toward 50%, and this bias generalises across both exchanges. However, the trade-size scale effect, whereby large trades are associated with amplified underconfidence in politics on Kalshi ($Δ= 0.53$, 95% confidence interval [0.29, 0.75]), does not replicate on Polymarket ($Δ= 0.11$, [-0.15, 0.39]), suggesting platform-specific microstructure. A Bayesian hierarchical model confirms the frequentist decomposition with 96.3% posterior predictive coverage. Consumers of prediction market prices who treat them as face-value probabilities will systematically misinterpret them, and the direction of misinterpretation depends on what is being predicted, when and by whom.

Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Abstract

Prediction markets are increasingly used as probability forecasting tools, yet their usefulness depends on calibration, specifically whether a contract trading at 70 cents truly implies a 70% probability. Using 292 million trades across 327,000 binary contracts on Kalshi and Polymarket, this paper shows that calibration is a structured, multidimensional phenomenon. On Kalshi, calibration decomposes into four components (a universal horizon effect, domain-specific biases, domain-by-horizon interactions and a trade-size scale effect) that together explain 87.3% of calibration variance. The dominant pattern is persistent underconfidence in political markets, where prices are chronically compressed toward 50%, and this bias generalises across both exchanges. However, the trade-size scale effect, whereby large trades are associated with amplified underconfidence in politics on Kalshi (, 95% confidence interval [0.29, 0.75]), does not replicate on Polymarket (, [-0.15, 0.39]), suggesting platform-specific microstructure. A Bayesian hierarchical model confirms the frequentist decomposition with 96.3% posterior predictive coverage. Consumers of prediction market prices who treat them as face-value probabilities will systematically misinterpret them, and the direction of misinterpretation depends on what is being predicted, when and by whom.
Paper Structure (35 sections, 13 equations, 7 figures, 16 tables)

This paper contains 35 sections, 13 equations, 7 figures, 16 tables.

Figures (7)

  • Figure 1: Calibration slope $b$ versus time-to-resolution, one line per domain. Slopes above 1 indicate underconfidence (prices compressed toward 50%); slopes below 1 indicate overconfidence (prices too extreme).
  • Figure 2: Cross-platform calibration slope trajectories: (A) Kalshi vs (B) Polymarket. The dominant pattern, political underconfidence, replicates across exchanges. Finance is shown on Polymarket for visual context but is excluded from formal cross-platform analysis due to thin coverage (2,516 markets).
  • Figure 3: Four-panel decomposition of calibration slopes. (A) Universal horizon effect $\mu(\tau)$. (B) Domain intercepts $\alpha_d$. (C) Domain-by-horizon interactions $\beta_d(\tau)$. (D) Domain-by-size effects $\gamma_d(s)$.
  • Figure 4: Observed versus fitted calibration slopes ($R^2 = 0.873$). Each point is one of 216 analysis cells.
  • Figure 5: Calibration slopes by trade size: (A) Politics versus (B) Sports. In Politics, larger trades are associated with more compressed prices (higher slopes). In Sports, no such gradient exists.
  • ...and 2 more figures