Table of Contents
Fetching ...

High-dimensional online learning via asynchronous decomposition: Non-divergent results, dynamic regularization, and beyond

Shixiang Liu, Zhifan Li, Hanming Yang, Jianxin Yin

Abstract

Existing high-dimensional online learning methods often face the challenge that their error bounds, or per-batch sample sizes, diverge as the number of data batches increases. To address this issue, we propose an asynchronous decomposition framework that leverages summary statistics to construct a surrogate score function for current-batch learning. This framework is implemented via a dynamic-regularized iterative hard thresholding algorithm, providing a computationally and memory-efficient solution for sparse online optimization. We provide a unified theoretical analysis that accounts for both the streaming computational error and statistical accuracy, establishing that our estimator maintains non-divergent error bounds and $\ell_0$ sparsity across all batches. Furthermore, the proposed estimator adaptively achieves additional gains as batches accumulate, attaining the oracle accuracy as if the entire historical dataset were accessible and the true support were known. These theoretical properties are further illustrated through an example of the generalized linear model.

High-dimensional online learning via asynchronous decomposition: Non-divergent results, dynamic regularization, and beyond

Abstract

Existing high-dimensional online learning methods often face the challenge that their error bounds, or per-batch sample sizes, diverge as the number of data batches increases. To address this issue, we propose an asynchronous decomposition framework that leverages summary statistics to construct a surrogate score function for current-batch learning. This framework is implemented via a dynamic-regularized iterative hard thresholding algorithm, providing a computationally and memory-efficient solution for sparse online optimization. We provide a unified theoretical analysis that accounts for both the streaming computational error and statistical accuracy, establishing that our estimator maintains non-divergent error bounds and sparsity across all batches. Furthermore, the proposed estimator adaptively achieves additional gains as batches accumulate, attaining the oracle accuracy as if the entire historical dataset were accessible and the true support were known. These theoretical properties are further illustrated through an example of the generalized linear model.
Paper Structure (40 sections, 8 theorems, 107 equations, 1 figure, 1 algorithm)

This paper contains 40 sections, 8 theorems, 107 equations, 1 figure, 1 algorithm.

Key Result

Proposition 1

Suppose $f_b$ satisfies RSS$(m, M, (2C_s + 1)s )$ for $b=1,2$, and $f_1$ satisfies RGS$\left( L_1, C_e \sqrt s \lambda_\beta^{(1,\infty)}, (C_s +1)s \right)$. Assume Assumption assump: N1 holds with $C_p = \frac{4C_e^2 C_\beta}{m+M}$. For $b = 1, 2$, let the learning rate $\eta_b \in \left[ \frac{1} then the following $\ell_2$ and $\ell_0$ bounds hold: where $C_\beta := \frac{8}{m+M} \cdot \frac{

Figures (1)

  • Figure 1: Relationship between the $\ell_2$ error and the batch number $b$ in online learning of high-dimensional GLMs, where for ease of display we assume each batch contains $n$ samples, so that $N_b = \sum_{j=1}^b n_j = nb$. The $\log b$ term is introduced to ensure the error bounds hold uniformly for all batches $b \ge 1$.

Theorems & Definitions (12)

  • Remark 1
  • Proposition 1: Burn-in
  • Theorem 1: General batch
  • Theorem 2: Sharper bound
  • Theorem 3: Streaming GLM
  • Theorem 4: Oracle accuracy and support recovery
  • Lemma 1: Designs
  • proof : Proof of Lemma \ref{['le: rip and max']}
  • Lemma 2: Sub-Gaussian errors
  • proof : Proof of Lemma \ref{['lemma: subgaussian']}
  • ...and 2 more