Table of Contents
Fetching ...

Interpretable Feature Interaction via Statistical Self-supervised Learning on Tabular Data

Xiaochen Zhang, Haoyi Xiong

TL;DR

Spofe introduces a statistically principled, self-supervised framework for interpretable feature interactions on tabular data by uniting kernel PCA with a sparse polynomial representation. It provides a rigorous theory for error control and false discovery rate (FDR) via a multi-objective knockoff procedure, coupled with p-value based feature significance testing. Empirically, Spofe outperforms KPCA, SKPCA, and several baselines in regression and classification tasks, and its interpretability is demonstrated through Higgs and superconductivity case studies. The approach enables reliable identification of interacting features with tangible, physics-aligned insights, making it suitable for high-stakes domains where transparency and statistical validity are essential.

Abstract

In high-dimensional and high-stakes contexts, ensuring both rigorous statistical guarantees and interpretability in feature extraction from complex tabular data remains a formidable challenge. Traditional methods such as Principal Component Analysis (PCA) reduce dimensionality and identify key features that explain the most variance, but are constrained by their reliance on linear assumptions. In contrast, neural networks offer assumption-free feature extraction through self-supervised learning techniques such as autoencoders, though their interpretability remains a challenge in fields requiring transparency. To address this gap, this paper introduces Spofe, a novel self-supervised machine learning pipeline that marries the power of kernel principal components for capturing nonlinear dependencies with a sparse and principled polynomial representation to achieve clear interpretability with statistical rigor. Underpinning our approach is a robust theoretical framework that delivers precise error bounds and rigorous false discovery rate (FDR) control via a multi-objective knockoff selection procedure; it effectively bridges the gap between data-driven complexity and statistical reliability via three stages: (1) generating self-supervised signals using kernel principal components to model complex patterns, (2) distilling these signals into sparse polynomial functions for improved interpretability, and (3) applying a multi-objective knockoff selection procedure with significance testing to rigorously identify important features. Extensive experiments on diverse real-world datasets demonstrate the effectiveness of Spofe, consistently surpassing KPCA, SKPCA, and other methods in feature selection for regression and classification tasks. Visualization and case studies highlight its ability to uncover key insights, enhancing interpretability and practical utility.

Interpretable Feature Interaction via Statistical Self-supervised Learning on Tabular Data

TL;DR

Spofe introduces a statistically principled, self-supervised framework for interpretable feature interactions on tabular data by uniting kernel PCA with a sparse polynomial representation. It provides a rigorous theory for error control and false discovery rate (FDR) via a multi-objective knockoff procedure, coupled with p-value based feature significance testing. Empirically, Spofe outperforms KPCA, SKPCA, and several baselines in regression and classification tasks, and its interpretability is demonstrated through Higgs and superconductivity case studies. The approach enables reliable identification of interacting features with tangible, physics-aligned insights, making it suitable for high-stakes domains where transparency and statistical validity are essential.

Abstract

In high-dimensional and high-stakes contexts, ensuring both rigorous statistical guarantees and interpretability in feature extraction from complex tabular data remains a formidable challenge. Traditional methods such as Principal Component Analysis (PCA) reduce dimensionality and identify key features that explain the most variance, but are constrained by their reliance on linear assumptions. In contrast, neural networks offer assumption-free feature extraction through self-supervised learning techniques such as autoencoders, though their interpretability remains a challenge in fields requiring transparency. To address this gap, this paper introduces Spofe, a novel self-supervised machine learning pipeline that marries the power of kernel principal components for capturing nonlinear dependencies with a sparse and principled polynomial representation to achieve clear interpretability with statistical rigor. Underpinning our approach is a robust theoretical framework that delivers precise error bounds and rigorous false discovery rate (FDR) control via a multi-objective knockoff selection procedure; it effectively bridges the gap between data-driven complexity and statistical reliability via three stages: (1) generating self-supervised signals using kernel principal components to model complex patterns, (2) distilling these signals into sparse polynomial functions for improved interpretability, and (3) applying a multi-objective knockoff selection procedure with significance testing to rigorously identify important features. Extensive experiments on diverse real-world datasets demonstrate the effectiveness of Spofe, consistently surpassing KPCA, SKPCA, and other methods in feature selection for regression and classification tasks. Visualization and case studies highlight its ability to uncover key insights, enhancing interpretability and practical utility.

Paper Structure

This paper contains 40 sections, 2 theorems, 37 equations, 2 figures, 12 tables, 4 algorithms.

Key Result

Theorem 1

Suppose the signal index $j$ is fixed. Let $q \in [0, 1]$ be a target false discovery rate. Applying the Knockoff procedure with the threshold $\tau$ defined in eq:knockoff threshold to select significant polynomial features $\psi_d$, where $z_{ij}$ are the response variables and $\psi_d(\boldsymbol where $\hat{A}_j = \{ d : W^{(j)}_d \geq \tau \}$ is the set of selected indices and $\beta_d^{*(j)

Figures (2)

  • Figure 1: Prediction accuracy using features selected by Spofe based on the first $m$ kernel principal components ($m = 1, 2, 3, 50$) with different prediction models
  • Figure 2: Comparison of Dimensionality Reduction: Top Polynomial Features Identified by Spofe Versus Top Two Principal Components by KPCA and Linear PCA

Theorems & Definitions (8)

  • Definition 1: Sparse Polynomial Function Space
  • Definition 2: Sparse Polynomial Representation of Kernel PCA
  • Definition 3: Significant Polynomial Terms
  • Definition 4: Hypotheses for Feature Selection
  • Theorem 1: FDR Control for Significant Polynomial Features
  • proof
  • Theorem 2: Error Bounds with FDR Control for Kernel PCA Approximation
  • proof