Table of Contents
Fetching ...

EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks

Michael Arbel, David Salinas, Frank Hutter

TL;DR

EquiTabPFN introduces a target-permutation equivariant architecture for prior-fitted networks to handle arbitrary class counts in tabular data. By integrating a target-equivariant encoder, alternating bi-attention across features and samples, and a non-parametric equivariant decoder, the model achieves robust in-context learning without fixed target dimensionality. Theoretical results show that the optimal pre-training objective naturally favors target-equivariant functions, and empirical results demonstrate strong performance and runtime efficiency on unseen-class benchmarks compared to existing PFN variants. This work provides a principled approach to leveraging symmetry in tabular data, reducing the need for costly ensembling while enabling scalable classification across diverse task sizes.

Abstract

Recent foundational models for tabular data, such as TabPFN, excel at adapting to new tasks via in-context learning, but remain constrained to a fixed, pre-defined number of target dimensions-often necessitating costly ensembling strategies. We trace this constraint to a deeper architectural shortcoming: these models lack target equivariance, so that permuting target dimension orderings alters their predictions. This deficiency gives rise to an irreducible "equivariance gap", an error term that introduces instability in predictions. We eliminate this gap by designing a fully target-equivariant architecture-ensuring permutation invariance via equivariant encoders, decoders, and a bi-attention mechanism. Empirical evaluation on standard classification benchmarks shows that, on datasets with more classes than those seen during pre-training, our model matches or surpasses existing methods while incurring lower computational overhead.

EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks

TL;DR

EquiTabPFN introduces a target-permutation equivariant architecture for prior-fitted networks to handle arbitrary class counts in tabular data. By integrating a target-equivariant encoder, alternating bi-attention across features and samples, and a non-parametric equivariant decoder, the model achieves robust in-context learning without fixed target dimensionality. Theoretical results show that the optimal pre-training objective naturally favors target-equivariant functions, and empirical results demonstrate strong performance and runtime efficiency on unseen-class benchmarks compared to existing PFN variants. This work provides a principled approach to leveraging symmetry in tabular data, reducing the need for costly ensembling while enabling scalable classification across diverse task sizes.

Abstract

Recent foundational models for tabular data, such as TabPFN, excel at adapting to new tasks via in-context learning, but remain constrained to a fixed, pre-defined number of target dimensions-often necessitating costly ensembling strategies. We trace this constraint to a deeper architectural shortcoming: these models lack target equivariance, so that permuting target dimension orderings alters their predictions. This deficiency gives rise to an irreducible "equivariance gap", an error term that introduces instability in predictions. We eliminate this gap by designing a fully target-equivariant architecture-ensuring permutation invariance via equivariant encoders, decoders, and a bi-attention mechanism. Empirical evaluation on standard classification benchmarks shows that, on datasets with more classes than those seen during pre-training, our model matches or surpasses existing methods while incurring lower computational overhead.

Paper Structure

This paper contains 27 sections, 1 theorem, 12 equations, 8 figures, 6 tables.

Key Result

Proposition 5.3

Under assump:convex_lossassump:invariant_distribution, the equivariance gap $\mathcal{E}^\text{equi}(f)$ is always non-negative and only equal to $0$ when $f$ is equivariant to permutations, so that for any $f$: Moreover, if $f^{\star}$ is a minimizer of $\mathcal{L}$ over all measurable functions, then $f^{\star}$ must be target equivariant.

Figures (8)

  • Figure 1: Overview of EquiTabPFN's architecture. Data is tokenized via an encoder, processed using self-attention, and decoded to obtain predictions. The encoder maps each covariate to a single token and embeds target components into tokens via a $1\times 1$ convolution. Missing test tokens are replaced by prediction tokens. Self-attention alternates between (1) feature-wise attention, with target tokens attending only to covariate tokens (gray arrows) while covariate tokens attend to all tokens (blue arrows); (2) Data-wise attention, where test tokens attend only to training tokens (blue arrows), and training tokens attend to themselves (gray arrows).
  • Figure 2: Prediction comparison of TabPFN, TabPFN-v2, and our model on the same datasets with three different class orderings (one per column). Models predict on a dense grid using 9 distinct training points, marked with dark crosses, each having a distinct class.
  • Figure 3: Equivariance error for TabPFN observed while training (top) and at inference with different number of classes and ensembles (bottom).
  • Figure 4: Relative improvement over KNN for datasets with less than 10 classes (left) and more than 10 classes (right). Red lines are the median metric over datasets after averaging each dataset over $10$ splits. The runtime is displayed with color on a log scale and is reported on a V100 GPU for PFNs.
  • Figure 5: Left: Scatter plot of runtime vs $\%$ AUC improvement over KNN for different methods. Right: barplot of AUC ratio relatively to KNN (blue) and speedup of EquiTabPFN (green). In both figures, we increase the number of ensembles of TabPFNs variants.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 5.1: Target permutation equivariance
  • Definition 5.2: Target-equivariance gap
  • Proposition 5.3
  • proof : Proof sketch.
  • proof : Proof of \ref{['prop:error_decomposition']}