Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

Xiaotong Liu; Yunwen Lei; Xiangyu Chang; Shao-Bo Lin

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

Xiaotong Liu, Yunwen Lei, Xiangyu Chang, Shao-Bo Lin

TL;DR

It is rigorously demonstrated that KGD, equipped with the proposed adaptive parameter selection strategy, achieves the optimal generalization error bound and adapts effectively to different kernels, target functions, and error metrics.

Abstract

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify iteration increments in KGD, deriving an adaptive parameter selection strategy that is implementable. Theoretical verifications are provided within the framework of learning theory. Utilizing the recently developed integral operator approach, we rigorously demonstrate that KGD, equipped with the proposed adaptive parameter selection strategy, achieves the optimal generalization error bound and adapts effectively to different kernels, target functions, and error metrics. Consequently, this strategy showcases significant advantages over existing parameter selection methods for KGD.

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

TL;DR

Abstract

Paper Structure (15 sections, 13 theorems, 114 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 13 theorems, 114 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
KGD with hybrid selection strategy
KGD with hybrid selection strategy
Related works
Theoretical behaviors
Bias and variance analysis of KGD
Semi-adaptive stopping rules and upper bound of iterations
Optimal generalization error bounds for KGD with HSS
Numerical analysis
Simulation experiments
Simulation 1: Feasibility and power of BSP
Simulation 2: Effectiveness and superior performance of HSS
Simulation 3: HSS overcoming covariate shift
Real data examples
Further discussion

Key Result

Lemma 1

Let $f\in\mathcal{H}_K$. Then where and $\|A\|$ denotes the spectral norm of the operator $A$.

Figures (10)

Figure 1: Role of the number of iterations in controlling the bias, variance and total error of KGD under the $L_2$ norm. The training samples $\{x_i\}_{i=1}^{1000}$ are independently drawn according to the uniform distribution on the (hyper-)cube $[0,1]^d$ with $d=1, 3$. The corresponding outputs are generated by the model $y_i=g_j(x_i)+\varepsilon_i, j=1,2$, where $g_1$ and $g_2$ are defined by (\ref{['g1']}) and (\ref{['g2']}), respectively, and $\varepsilon_i$ is the independent Gaussian noise $\mathcal{N}(0, \sigma^2)$ with $\sigma=0.6$. The bias and variance is defined by \ref{['Error-dec.1']} below.
Figure 2: Step $t_{\tilde{C}}$ determined by different values of the constant $\tilde{C}$ under the BSP. The experimental setting is the same as in Figure \ref{['fig:bias--variance']}.
Figure 3: Relation between the $L_2$ norm/$L_{\infty}$ norm and the constant $\tilde{C}$.
Figure 4: Range of $\hat{C}_{j^*}$ under BSP and range of $\hat{t}^*$ under BS (results from a single experiment).
Figure 5: Generalization performance of BS, HO, and HSS (where $L$ in HSS is fixed at $0.6|D|$).
...and 5 more figures

Theorems & Definitions (14)

Definition 1: Backward Selection Principle (BSP)
Lemma 1
Proposition 2
Corollary 3
Proposition 4
Proposition 5
Theorem 6
Corollary 7
Corollary 8
Lemma A.1
...and 4 more

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

TL;DR

Abstract

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (14)