Lepskii Principle for Distributed Kernel Ridge Regression

Shao-Bo Lin

Lepskii Principle for Distributed Kernel Ridge Regression

Shao-Bo Lin

TL;DR

This work tackles adaptive parameter selection in distributed kernel ridge regression under data privacy constraints. By integrating the Lepskii balancing principle with a non-private communication protocol, it yields Lep-AdaDKRR, which uses double-weighted averaging to synthesize local estimates without sharing raw data. The authors prove rate-optimal adaptive guarantees: the global estimator achieves $\mathbb{E}\|\overline{f}_{D,\boldsymbol{\lambda}^*}-f_\rho\|_\rho^2 \lesssim |D|^{-\frac{2r}{2r+s}}$ and $\mathbb{E}\|\overline{f}_{D,\boldsymbol{\lambda}^*}-f_\rho\|_K^2 \lesssim |D|^{-\frac{2r-1}{2r+s}}$ for $r\in[\tfrac{1}{2},1]$, $s\in[0,1]$, and block sizes $|D_j|$ satisfying a mild growth condition. The method remains data-private, does not require equal block sizes, and achieves adaptation to the regression function regularity and kernel capacity, bridging theory and privacy-preserving practice in distributed learning. The results also discuss a saturation phenomenon and outline extensions to broader kernel-based algorithms. Overall, Lep-AdaDKRR provides a principled, adaptively optimal approach for distributed kernel learning without exchanging private data.

Abstract

Parameter selection without communicating local data is quite challenging in distributed learning, exhibing an inconsistency between theoretical analysis and practical application of it in tackling distributively stored data. Motivated by the recently developed Lepskii principle and non-privacy communication protocol for kernel learning, we propose a Lepskii principle to equip distributed kernel ridge regression (DKRR) and consequently develop an adaptive DKRR with Lepskii principle (Lep-AdaDKRR for short) by using a double weighted averaging synthesization scheme. We deduce optimal learning rates for Lep-AdaDKRR and theoretically show that Lep-AdaDKRR succeeds in adapting to the regularity of regression functions, effective dimension decaying rate of kernels and different metrics of generalization, which fills the gap of the mentioned inconsistency between theory and application.

Lepskii Principle for Distributed Kernel Ridge Regression

TL;DR

and

for

, and block sizes

satisfying a mild growth condition. The method remains data-private, does not require equal block sizes, and achieves adaptation to the regression function regularity and kernel capacity, bridging theory and privacy-preserving practice in distributed learning. The results also discuss a saturation phenomenon and outline extensions to broader kernel-based algorithms. Overall, Lep-AdaDKRR provides a principled, adaptively optimal approach for distributed kernel learning without exchanging private data.

Abstract

Paper Structure (11 sections, 20 theorems, 162 equations)

This paper contains 11 sections, 20 theorems, 162 equations.

Introduction
Adaptive DKRR with Lepskii Principle
DKRR and parameter selection
Lepskii principle for DKRR
Adaptive DKRR with Lepskii principle
Theoretical Verifications
Proofs
Operator perturbation and operator representation
Generalization error analysis on under-estimated agents
Generalization error analysis on over-estimated agents
Proof of Theorem \ref{['Theorem:Optimal-Rate-adaptive']}

Key Result

lemma thmcounterlemma

Let $\overline{f}_{D,\lambda}$ be defined by DKRR-1. Then where $\|\cdot\|_*$ denotes either $\|\cdot\|_\rho$ or $\|\cdot\|_K$ and $f^\diamond_{D_j,\lambda}$ is the noise free version of $f_{D_j,\lambda}$ given by

Theorems & Definitions (37)

lemma thmcounterlemma
theorem 1
remark thmcounterremark
remark thmcounterremark
lemma thmcounterlemma
lemma thmcounterlemma
lemma thmcounterlemma
lemma thmcounterlemma
proof
lemma thmcounterlemma
...and 27 more

Lepskii Principle for Distributed Kernel Ridge Regression

TL;DR

Abstract

Lepskii Principle for Distributed Kernel Ridge Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (37)