Table of Contents
Fetching ...

Valid Feature-Level Inference for Tabular Foundation Models via the Conditional Randomization Test

Mohamed Salem

TL;DR

This article presents a practical approach to feature-level hypothesis testing that combines the Conditional Randomization Test (CRT) with TabPFN, a probabilistic foundation model for tabular data, and yields finite-sample valid p-values for conditional feature relevance, even in nonlinear and correlated settings.

Abstract

Modern machine learning models are highly expressive but notoriously difficult to analyze statistically. In particular, while black-box predictors can achieve strong empirical performance, they rarely provide valid hypothesis tests or p-values for assessing whether individual features contain information about a target variable. This article presents a practical approach to feature-level hypothesis testing that combines the Conditional Randomization Test (CRT) with TabPFN, a probabilistic foundation model for tabular data. The resulting procedure yields finite-sample valid p-values for conditional feature relevance, even in nonlinear and correlated settings, without requiring model retraining or parametric assumptions.

Valid Feature-Level Inference for Tabular Foundation Models via the Conditional Randomization Test

TL;DR

This article presents a practical approach to feature-level hypothesis testing that combines the Conditional Randomization Test (CRT) with TabPFN, a probabilistic foundation model for tabular data, and yields finite-sample valid p-values for conditional feature relevance, even in nonlinear and correlated settings.

Abstract

Modern machine learning models are highly expressive but notoriously difficult to analyze statistically. In particular, while black-box predictors can achieve strong empirical performance, they rarely provide valid hypothesis tests or p-values for assessing whether individual features contain information about a target variable. This article presents a practical approach to feature-level hypothesis testing that combines the Conditional Randomization Test (CRT) with TabPFN, a probabilistic foundation model for tabular data. The resulting procedure yields finite-sample valid p-values for conditional feature relevance, even in nonlinear and correlated settings, without requiring model retraining or parametric assumptions.
Paper Structure (27 sections, 19 equations, 2 figures, 1 table)

This paper contains 27 sections, 19 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Empirical cumulative distribution functions of CRT p-values for conditionally relevant and irrelevant features. Null p-values closely follow the Uniform$(0,1)$ distribution, while relevant features exhibit strong concentration near zero, indicating both valid calibration and high power.
  • Figure 2: Quantile–quantile plot of empirical null p-values versus the Uniform$(0,1)$ distribution. The close alignment with the diagonal is consistent with finite-sample calibration in these runs of the TabPFN-based Conditional Randomization Test across heterogeneous datasets.