Table of Contents
Fetching ...

Nonparametric Linear Feature Learning in Regression Through Regularisation

Bertille Follain, Francis Bach

TL;DR

The paper addresses high-dimensional regression under a multi-index model f*(x) = g*(P^T x) by proposing RegFeaL, a regularised empirical risk minimisation approach that jointly learns the nonparametric function and the low-dimensional subspace via derivative-based penalties. It leverages Hermite polynomials to achieve rotation-invariance and develops a variational/kernel framework with an alternating minimisation procedure to estimate the subspace, the regression function, and its dimension. The authors provide high-probability convergence guarantees with explicit rates under minimal assumptions and demonstrate empirical effectiveness across simulated datasets, comparing favorably to state-of-the-art methods in scenarios with hidden linear structure. The work contributes a versatile, principled method for feature learning and e.d.r. space estimation that can be applied beyond standard square-loss settings and offers a practical kernel-based implementation with scalable sampling tricks. Overall, RegFeaL advances representation learning for high-dimensional regression by integrating derivative penalties, Hermite bases, and rotation-aware optimisation to uncover hidden linear drivers of the response.

Abstract

Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would greatly enhance prediction, computation, and interpretation. To address this challenge, we propose a novel method for joint linear feature learning and non-parametric function estimation, aimed at more effectively leveraging hidden features for learning. Our approach employs empirical risk minimisation, augmented with a penalty on function derivatives, ensuring versatility. Leveraging the orthogonality and rotation invariance properties of Hermite polynomials, we introduce our estimator, named RegFeaL. By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions. We establish that the expected risk of our method converges in high-probability to the minimal risk under minimal assumptions and with explicit rates. Additionally, we provide empirical results demonstrating the performance of RegFeaL in various experiments.

Nonparametric Linear Feature Learning in Regression Through Regularisation

TL;DR

The paper addresses high-dimensional regression under a multi-index model f*(x) = g*(P^T x) by proposing RegFeaL, a regularised empirical risk minimisation approach that jointly learns the nonparametric function and the low-dimensional subspace via derivative-based penalties. It leverages Hermite polynomials to achieve rotation-invariance and develops a variational/kernel framework with an alternating minimisation procedure to estimate the subspace, the regression function, and its dimension. The authors provide high-probability convergence guarantees with explicit rates under minimal assumptions and demonstrate empirical effectiveness across simulated datasets, comparing favorably to state-of-the-art methods in scenarios with hidden linear structure. The work contributes a versatile, principled method for feature learning and e.d.r. space estimation that can be applied beyond standard square-loss settings and offers a practical kernel-based implementation with scalable sampling tricks. Overall, RegFeaL advances representation learning for high-dimensional regression by integrating derivative penalties, Hermite bases, and rotation-aware optimisation to uncover hidden linear drivers of the response.

Abstract

Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would greatly enhance prediction, computation, and interpretation. To address this challenge, we propose a novel method for joint linear feature learning and non-parametric function estimation, aimed at more effectively leveraging hidden features for learning. Our approach employs empirical risk minimisation, augmented with a penalty on function derivatives, ensuring versatility. Leveraging the orthogonality and rotation invariance properties of Hermite polynomials, we introduce our estimator, named RegFeaL. By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions. We establish that the expected risk of our method converges in high-probability to the minimal risk under minimal assumptions and with explicit rates. Additionally, we provide empirical results demonstrating the performance of RegFeaL in various experiments.
Paper Structure (42 sections, 14 theorems, 98 equations, 5 figures, 1 algorithm)

This paper contains 42 sections, 14 theorems, 98 equations, 5 figures, 1 algorithm.

Key Result

Lemma 2.1

Let $f \in L^2(q)$ and express it as $f = \sum_{\alpha \in \mathbb{N}^d} \hat{f}(\alpha) H_\alpha$. Then for any $b \in [d]$,

Figures (5)

  • Figure 1: Performance dependency on $d$ and $n$ for the sinus dataset in the variable selection setting.
  • Figure 2: Performance dependency on $d$ and $n$ for the sinus dataset in the feature learning setting.
  • Figure 3: Performance dependency on $d$ and $n$ for the polynomial dataset in the feature learning setting.
  • Figure 4: Influence of the number of random features
  • Figure 5: Training behaviour.

Theorems & Definitions (27)

  • Lemma 2.1: Equivalence for dependency on variables
  • proof : Proof of Lemma \ref{['lem:sparse_variable']}
  • Lemma 2.2: Rotational invariance property of Hermite polynomials
  • Lemma 2.3: Properties of the regularisation
  • proof : Proof of Lemma \ref{['lem:properties_of_omega_feat']}
  • Lemma 3.1: Variational formulation
  • Lemma 3.2: Variational formulation of variable selection penalty
  • proof : Proof of Lemma \ref{['lemma:variational']}
  • Lemma 3.3: Variational formulation of feature learning penalty
  • Lemma 4.1: Use of Gaussian complexity
  • ...and 17 more