Lepskii Principle for Distributed Kernel Ridge Regression
Shao-Bo Lin
TL;DR
This work tackles adaptive parameter selection in distributed kernel ridge regression under data privacy constraints. By integrating the Lepskii balancing principle with a non-private communication protocol, it yields Lep-AdaDKRR, which uses double-weighted averaging to synthesize local estimates without sharing raw data. The authors prove rate-optimal adaptive guarantees: the global estimator achieves $\mathbb{E}\|\overline{f}_{D,\boldsymbol{\lambda}^*}-f_\rho\|_\rho^2 \lesssim |D|^{-\frac{2r}{2r+s}}$ and $\mathbb{E}\|\overline{f}_{D,\boldsymbol{\lambda}^*}-f_\rho\|_K^2 \lesssim |D|^{-\frac{2r-1}{2r+s}}$ for $r\in[\tfrac{1}{2},1]$, $s\in[0,1]$, and block sizes $|D_j|$ satisfying a mild growth condition. The method remains data-private, does not require equal block sizes, and achieves adaptation to the regression function regularity and kernel capacity, bridging theory and privacy-preserving practice in distributed learning. The results also discuss a saturation phenomenon and outline extensions to broader kernel-based algorithms. Overall, Lep-AdaDKRR provides a principled, adaptively optimal approach for distributed kernel learning without exchanging private data.
Abstract
Parameter selection without communicating local data is quite challenging in distributed learning, exhibing an inconsistency between theoretical analysis and practical application of it in tackling distributively stored data. Motivated by the recently developed Lepskii principle and non-privacy communication protocol for kernel learning, we propose a Lepskii principle to equip distributed kernel ridge regression (DKRR) and consequently develop an adaptive DKRR with Lepskii principle (Lep-AdaDKRR for short) by using a double weighted averaging synthesization scheme. We deduce optimal learning rates for Lep-AdaDKRR and theoretically show that Lep-AdaDKRR succeeds in adapting to the regularity of regression functions, effective dimension decaying rate of kernels and different metrics of generalization, which fills the gap of the mentioned inconsistency between theory and application.
