Table of Contents
Fetching ...

A principled framework for uncertainty decomposition in TabPFN

Sandra Fortini, Kenyon Ng, Sonia Petrone, Judith Rousseau, Susan Wei

TL;DR

This work develops a principled uncertainty decomposition framework for TabPFN by casting uncertainty quantification as Bayesian Predictive Inference (BPI) in a supervised, in-context setting. It derives a predictive CLT under quasi-martingale conditions, enabling fast, black-box estimates of epistemic and aleatoric uncertainty from the history of TabPFN's predictive updates and enabling asymptotically valid credible bands. The framework extends to entropy-based decomposition for classification via Beta/Dirichlet approximations and demonstrates near-nominal frequentist coverage on synthetic benchmarks and real data such as PSID labor-force participation. Taken together, the approach provides scalable, principled uncertainty quantification for foundation transformer models operating in tabular domains. The results highlight practical uncertainty diagnostics and open avenues for broader supervised-BPI extensions beyond TabPFN.

Abstract

TabPFN is a transformer that achieves state-of-the-art performance on supervised tabular tasks by amortizing Bayesian prediction into a single forward pass. However, there is currently no method for uncertainty decomposition in TabPFN. Because it behaves, in an idealised limit, as a Bayesian in-context learner, we cast the decomposition challenge as a Bayesian predictive inference (BPI) problem. The main computational tool in BPI, predictive Monte Carlo, is challenging to apply here as it requires simulating unmodeled covariates. We therefore pursue the asymptotic alternative, filling a gap in the theory for supervised settings by proving a predictive CLT under quasi-martingale conditions. We derive variance estimators determined by the volatility of predictive updates along the context. The resulting credible bands are fast to compute, target epistemic uncertainty, and achieve near-nominal frequentist coverage. For classification, we further obtain an entropy-based uncertainty decomposition.

A principled framework for uncertainty decomposition in TabPFN

TL;DR

This work develops a principled uncertainty decomposition framework for TabPFN by casting uncertainty quantification as Bayesian Predictive Inference (BPI) in a supervised, in-context setting. It derives a predictive CLT under quasi-martingale conditions, enabling fast, black-box estimates of epistemic and aleatoric uncertainty from the history of TabPFN's predictive updates and enabling asymptotically valid credible bands. The framework extends to entropy-based decomposition for classification via Beta/Dirichlet approximations and demonstrates near-nominal frequentist coverage on synthetic benchmarks and real data such as PSID labor-force participation. Taken together, the approach provides scalable, principled uncertainty quantification for foundation transformer models operating in tabular domains. The results highlight practical uncertainty diagnostics and open avenues for broader supervised-BPI extensions beyond TabPFN.

Abstract

TabPFN is a transformer that achieves state-of-the-art performance on supervised tabular tasks by amortizing Bayesian prediction into a single forward pass. However, there is currently no method for uncertainty decomposition in TabPFN. Because it behaves, in an idealised limit, as a Bayesian in-context learner, we cast the decomposition challenge as a Bayesian predictive inference (BPI) problem. The main computational tool in BPI, predictive Monte Carlo, is challenging to apply here as it requires simulating unmodeled covariates. We therefore pursue the asymptotic alternative, filling a gap in the theory for supervised settings by proving a predictive CLT under quasi-martingale conditions. We derive variance estimators determined by the volatility of predictive updates along the context. The resulting credible bands are fast to compute, target epistemic uncertainty, and achieve near-nominal frequentist coverage. For classification, we further obtain an entropy-based uncertainty decomposition.
Paper Structure (62 sections, 8 theorems, 98 equations, 20 figures, 1 table)

This paper contains 62 sections, 8 theorems, 98 equations, 20 figures, 1 table.

Key Result

Theorem 4.1

Assume that the following conditions hold: Then there exists a kernel $\tilde{F}(x,t)$ such that $F_n(x,\cdot)$ converges weakly to $\tilde{F}(x,\cdot)$ for every $x\in\mathcal{X}$, $\mathbb{P}$-a.s.

Figures (20)

  • Figure 1: Entropic UD for TabPFN on contexts drawn from logistic regression with different test covariates $x$ against varying context length. Solid and dotted lines indicated in-distribution and out-of-distribution $x$, respectively. We see decreasing epistemic uncertainty across most $x$ values as context length increases. In addition, the largest epistemic uncertainty occurs at out-of-distribution test covariates while aleatoric uncertainty is highest for $x=0$ at the decision boundary. See Appendix \ref{['app:entropic_ud']} for experimental details.
  • Figure 2: Entropic uncertainty decomposition for two moons (top) and 3-class spirals (bottom) classification tasks. Near the data, the decomposition behaves intuitively: total uncertainty peaks where classes overlap and is correctly attributed to aleatoric uncertainty. In the background regions far from the data, the decomposition diagnoses TabPFN’s tendency to revert to a stable, maximum-entropy prior. Our method faithfully captures this stability as low epistemic uncertainty (reflecting low variance across the ensemble) while assigning the resulting high total entropy to aleatoric uncertainty.
  • Figure 3: PSID labour-force participation data: TabPFN predicted participation probability $\mathbf{g}_n(\mathbf{x})$ versus family income (in $1000s), with 95% pointwise (left) and simultaneous (right) credible bands. Tick marks indicate observed covariate values.
  • Figure 4: ($\gamma_r$ summary) A histogram of the fitted $\gamma_{r}$ over 100 rollouts. The TabPFN ensemble size is 8 (left) and 16 (right).
  • Figure 5: ($\gamma_r$ summary) 95% confidence intervals of the fitted $\gamma_r$ over 100 rollouts. Ideally, the intervals should contain 2 (red dotted line). The TabPFN ensemble size is 8 (left) and 16 (right).
  • ...and 15 more figures

Theorems & Definitions (10)

  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Definition 7.1
  • Definition 7.2
  • Theorem 7.3
  • Theorem 7.4: Adapted from Proposition 1 in berti11central
  • Lemma 7.5: Lemma 2 in berti11central