Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Nam Anh Le

Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Nam Anh Le

Abstract

Prediction markets are increasingly used as probability forecasting tools, yet their usefulness depends on calibration, specifically whether a contract trading at 70 cents truly implies a 70% probability. Using 292 million trades across 327,000 binary contracts on Kalshi and Polymarket, this paper shows that calibration is a structured, multidimensional phenomenon. On Kalshi, calibration decomposes into four components (a universal horizon effect, domain-specific biases, domain-by-horizon interactions and a trade-size scale effect) that together explain 87.3% of calibration variance. The dominant pattern is persistent underconfidence in political markets, where prices are chronically compressed toward 50%, and this bias generalises across both exchanges. However, the trade-size scale effect, whereby large trades are associated with amplified underconfidence in politics on Kalshi ($Δ= 0.53$, 95% confidence interval [0.29, 0.75]), does not replicate on Polymarket ($Δ= 0.11$, [-0.15, 0.39]), suggesting platform-specific microstructure. A Bayesian hierarchical model confirms the frequentist decomposition with 96.3% posterior predictive coverage. Consumers of prediction market prices who treat them as face-value probabilities will systematically misinterpret them, and the direction of misinterpretation depends on what is being predicted, when and by whom.

Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Abstract

, 95% confidence interval [0.29, 0.75]), does not replicate on Polymarket (

, [-0.15, 0.39]), suggesting platform-specific microstructure. A Bayesian hierarchical model confirms the frequentist decomposition with 96.3% posterior predictive coverage. Consumers of prediction market prices who treat them as face-value probabilities will systematically misinterpret them, and the direction of misinterpretation depends on what is being predicted, when and by whom.

Paper Structure (35 sections, 13 equations, 7 figures, 16 tables)

This paper contains 35 sections, 13 equations, 7 figures, 16 tables.

Introduction
Summary of findings
Related work
Plan of the paper
Data
Data source
Domain classification
Measuring calibration
Analysis dimensions
The calibration landscape
Three stylised facts
Diagnosing potential artefacts
Is political underconfidence driven by a subset of markets?
Composition effects in the 1--3 hour bin
Weighting sensitivity and the scale effect
...and 20 more sections

Figures (7)

Figure 1: Calibration slope $b$ versus time-to-resolution, one line per domain. Slopes above 1 indicate underconfidence (prices compressed toward 50%); slopes below 1 indicate overconfidence (prices too extreme).
Figure 2: Cross-platform calibration slope trajectories: (A) Kalshi vs (B) Polymarket. The dominant pattern, political underconfidence, replicates across exchanges. Finance is shown on Polymarket for visual context but is excluded from formal cross-platform analysis due to thin coverage (2,516 markets).
Figure 3: Four-panel decomposition of calibration slopes. (A) Universal horizon effect $\mu(\tau)$. (B) Domain intercepts $\alpha_d$. (C) Domain-by-horizon interactions $\beta_d(\tau)$. (D) Domain-by-size effects $\gamma_d(s)$.
Figure 4: Observed versus fitted calibration slopes ($R^2 = 0.873$). Each point is one of 216 analysis cells.
Figure 5: Calibration slopes by trade size: (A) Politics versus (B) Sports. In Politics, larger trades are associated with more compressed prices (higher slopes). In Sports, no such gradient exists.
...and 2 more figures

Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Abstract

Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Authors

Abstract

Table of Contents

Figures (7)