Distributed Gradient Descent for Functional Learning

Zhan Yu; Jun Fan; Zhongjie Shi; Ding-Xuan Zhou

Distributed Gradient Descent for Functional Learning

Zhan Yu, Jun Fan, Zhongjie Shi, Ding-Xuan Zhou

TL;DR

The paper addresses scalable learning from massively distributed functional data by introducing DGDFL, a divide-and-conquer gradient-descent framework operating in a reproducing kernel Hilbert space. It first establishes a single-machine gradient-descent functional learning (GDFL) procedure, then extends to distributed DGDFL with finite local machines and a central fusion step, plus a semi-supervised variant that leverages unlabeled data. Under capacity and regularity assumptions and two noise regimes, it proves confidence-based learning rates for all estimators and demonstrates that the gradient-descent approach overcomes saturation phenomena seen in prior functional-learning work. The results show that DGDFL achieves optimal or near-optimal rates with computational complexity on the order of $\mathcal{O}(|D|^2/m^2)$, while preserving privacy and enabling scalability. Numerical experiments corroborate the theoretical findings, showing substantial speedups over kernel-based baselines with comparable predictive performance, particularly for large-scale functional data.

Abstract

In recent years, different types of distributed and parallel learning schemes have received increasing attention for their strong advantages in handling large-scale data information. In the information era, to face the big data challenges {that} stem from functional data analysis very recently, we propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across numerous local machines (processors) in the framework of reproducing kernel Hilbert space. Based on integral operator approaches, we provide the first theoretical understanding of the DGDFL algorithm in many different aspects of the literature. On the way of understanding DGDFL, firstly, a data-based gradient descent functional learning (GDFL) algorithm associated with a single-machine model is proposed and comprehensively studied. Under mild conditions, confidence-based optimal learning rates of DGDFL are obtained without the saturation boundary on the regularity index suffered in previous works in functional regression. We further provide a semi-supervised DGDFL approach to weaken the restriction on the maximal number of local machines to ensure optimal rates. To our best knowledge, the DGDFL provides the first divide-and-conquer iterative training approach to functional learning based on data samples of intrinsically infinite-dimensional random functions (functional covariates) and enriches the methodologies for functional data analysis.

Distributed Gradient Descent for Functional Learning

TL;DR

, while preserving privacy and enabling scalability. Numerical experiments corroborate the theoretical findings, showing substantial speedups over kernel-based baselines with comparable predictive performance, particularly for large-scale functional data.

Abstract

Paper Structure (26 sections, 18 theorems, 206 equations, 4 figures, 1 table)

This paper contains 26 sections, 18 theorems, 206 equations, 4 figures, 1 table.

Introduction
Main results and discussions
Gradient descent functional learning algorithm
Distributed gradient descent functional learning algorithm
Semi-supervised DGDFL algorithm
Some further remarks
Remarks on the notion "distributed learning"
Remarks on decentralized kernel-based distributed learning
Some advantages of DGDFL in privacy protection and discussion
Remarks on scalability and possible future kernel approximation approaches
Remarks on essential differences from conventional (regularized) linear regression
Preliminary results
Approximation error of a data-free iterative GDFL algorithm
Empirical operator and basic lemmas
Analysis of GDFL algorithm
...and 11 more sections

Key Result

Theorem 1

Assume conditions capacity-regularity hold. Let the stepsize be selected as $\gamma_k=\frac{\gamma}{(k+1)^{\mu}}, 0\leq\mu<1, 0<\gamma\leq \frac{1}{(1+\kappa)^2(1+B_C^2B_K^2)}$, total iteration step $t=\left\lfloor|D|^{\frac{1}{(2\theta+\alpha+1)(1-\mu)}}\right\rfloor$. If noise condition 2momc hold and if noise condition pmomc holds, we have, with probability at least $1-\delta$, $C_1^*$ and $C_

Figures (4)

Figure 1: The excess risk w.r.t. the sample size for the DGDFL algorithm, with the number of local machines being $m=1, 10, 50,100$ respectively, and $\sigma=1$. The experiments are repeated for 20 times.
Figure 2: The excess risk w.r.t. the sample size for the DGDFL algorithm, with the number of local machines being $m=1, 10, 50,100$ respectively, and $\sigma=1.5$. The experiments are repeated for 20 times.
Figure 3: The excess risk w.r.t. the sample size for the GDFL, DGDFL, RK, and DRK algorithms, with the the number in the label indicating the number of local machines. The experiments are repeated for 50 times.
Figure 4: The computation time w.r.t. the sample size for the GDFL, DGDFL, RK, and DRK algorithms, with the the number in the label indicating the number of local machines. The y-axis is in the log scale. The experiments are repeated for 50 times.

Theorems & Definitions (29)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
Corollary 1
Corollary 2
Theorem 5
Lemma 1
Theorem 6
proof
...and 19 more

Distributed Gradient Descent for Functional Learning

TL;DR

Abstract

Distributed Gradient Descent for Functional Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (29)