Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning
Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens
TL;DR
This work addresses the limited flexibility of kernel ridgeless regression by introducing Local-Adaptive-Bandwidth (LAB) RBF kernels and a learning framework that treats the hypothesis space as an integral space of RKHSs. The authors develop an asymmetric kernel ridgeless model, derive solvable forms for coefficient optimization, and propose a two-stage LAB kernel learning with a dynamic strategy for support data. Theoretical contributions include an integral-RKHS interpretation, sparsity analysis, and an $\ell_q$-regularization-based approximation analysis yielding near $N^{-\beta/4}$ learning rates; these are complemented by a detailed error decomposition and Rademacher chaos-based bounds. Empirically, LAB RBF regression achieves state-of-the-art or competitive accuracy on synthetic and real datasets while using far fewer support vectors, illustrating both high representational capacity and strong generalization in a ridgeless setting. This work thus provides a principled path to enhance kernel methods with data-adaptive bandwidths and a rigorous understanding of the resulting generalization behavior.
Abstract
Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions.
