Table of Contents
Fetching ...

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens

TL;DR

This work addresses the limited flexibility of kernel ridgeless regression by introducing Local-Adaptive-Bandwidth (LAB) RBF kernels and a learning framework that treats the hypothesis space as an integral space of RKHSs. The authors develop an asymmetric kernel ridgeless model, derive solvable forms for coefficient optimization, and propose a two-stage LAB kernel learning with a dynamic strategy for support data. Theoretical contributions include an integral-RKHS interpretation, sparsity analysis, and an $\ell_q$-regularization-based approximation analysis yielding near $N^{-\beta/4}$ learning rates; these are complemented by a detailed error decomposition and Rademacher chaos-based bounds. Empirically, LAB RBF regression achieves state-of-the-art or competitive accuracy on synthetic and real datasets while using far fewer support vectors, illustrating both high representational capacity and strong generalization in a ridgeless setting. This work thus provides a principled path to enhance kernel methods with data-adaptive bandwidths and a rigorous understanding of the resulting generalization behavior.

Abstract

Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions.

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

TL;DR

This work addresses the limited flexibility of kernel ridgeless regression by introducing Local-Adaptive-Bandwidth (LAB) RBF kernels and a learning framework that treats the hypothesis space as an integral space of RKHSs. The authors develop an asymmetric kernel ridgeless model, derive solvable forms for coefficient optimization, and propose a two-stage LAB kernel learning with a dynamic strategy for support data. Theoretical contributions include an integral-RKHS interpretation, sparsity analysis, and an -regularization-based approximation analysis yielding near learning rates; these are complemented by a detailed error decomposition and Rademacher chaos-based bounds. Empirically, LAB RBF regression achieves state-of-the-art or competitive accuracy on synthetic and real datasets while using far fewer support vectors, illustrating both high representational capacity and strong generalization in a ridgeless setting. This work thus provides a principled path to enhance kernel methods with data-adaptive bandwidths and a rigorous understanding of the resulting generalization behavior.

Abstract

Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an -regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an -norm analysis technique (with ) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions.
Paper Structure (27 sections, 14 theorems, 107 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 14 theorems, 107 equations, 10 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

One of the stationary points of (equ: asy krr) is

Figures (10)

  • Figure 1: A toy example illustrating kernel ridgeless regression applied to a one-dimensional signal $y=\sin(2x^3)$. In (a), the traditional RBF kernel is utilized, directly interpolating all data points. In (b), asymmetric kernel learning is applied, where a small subset is used as support data and the LAB RBF kernel is learnt from the remaining data.
  • Figure 2: Optimization for evaluating $f$ in our kernel ridgeless regression framework. To enhance the model's flexibility, we introduce trainable bandwidths, which further enable the reduce of required number of support data.
  • Figure 3: Optimal subspace selection when learning kernels.
  • Figure 4: Coefficient matrix of $f_{\mathcal{Z},{\bm \Theta}}$, exhibiting sparse property.
  • Figure 5: Effect of the number of support data on the performance of Algorithm \ref{['alg: AKL']}. Three synthetic are used. Results of Algorithm \ref{['alg: AKL']} is presented in solid lines, and results of traditional kernel interpolation models are shown in dash lines. Various levels of noise are introduced into the training data.
  • ...and 5 more figures

Theorems & Definitions (16)

  • Theorem 1
  • Corollary 2
  • Proposition 3
  • Definition 4
  • Theorem 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Definition 9
  • Lemma 10
  • ...and 6 more