Step-Size Decay and Structural Stagnation in Greedy Sparse Learning

Pablo M. Berná

Step-Size Decay and Structural Stagnation in Greedy Sparse Learning

Pablo M. Berná

TL;DR

It is shown that over-decaying step-size schedules induce structural stagnation even in low-dimensional sparse settings with realizable regression problems with controlled feature coherence and derive explicit lower bounds on the residual norm.

Abstract

Greedy algorithms are central to sparse approximation and stage-wise learning methods such as matching pursuit and boosting. It is known that the Power-Relaxed Greedy Algorithm with step sizes $m^{-α}$ may fail to converge when $α>1$ in general Hilbert spaces. In this work, we revisit this phenomenon from a sparse learning perspective. We study realizable regression problems with controlled feature coherence and derive explicit lower bounds on the residual norm, showing that over-decaying step-size schedules induce structural stagnation even in low-dimensional sparse settings. Numerical experiments confirm the theoretical predictions and illustrate the role of feature coherence. Our results provide insight into step-size design in greedy sparse learning.

Step-Size Decay and Structural Stagnation in Greedy Sparse Learning

TL;DR

Abstract

Greedy algorithms are central to sparse approximation and stage-wise learning methods such as matching pursuit and boosting. It is known that the Power-Relaxed Greedy Algorithm with step sizes

may fail to converge when

in general Hilbert spaces. In this work, we revisit this phenomenon from a sparse learning perspective. We study realizable regression problems with controlled feature coherence and derive explicit lower bounds on the residual norm, showing that over-decaying step-size schedules induce structural stagnation even in low-dimensional sparse settings. Numerical experiments confirm the theoretical predictions and illustrate the role of feature coherence. Our results provide insight into step-size design in greedy sparse learning.

Paper Structure (15 sections, 4 theorems, 72 equations, 2 figures)

This paper contains 15 sections, 4 theorems, 72 equations, 2 figures.

Introduction
Main theoretical result
Relation to other greedy learning methods
Boosting and stage-wise additive models.
Frank--Wolfe and projection-free optimization.
Greedy sparse approximation.
Discussion and implications
Step-size schedules.
Interaction with stochastic noise.
Numerical experiments
Experimental setup
Stagnation as a function of coherence
Dependence of stagnation on $\alpha$
Conclusion
Auxiliary results on the infinite product

Key Result

Theorem 2.1

Consider the Euclidean space $(\mathbb R^n, \|\cdot\|_2)$. Let $\alpha>1$ and define $\lambda_m = m^{-\alpha}$. Let $x_1,x_2\in\mathbb R^n$ be unit vectors, $\|x_1\|_2=\|x_2\|_2=1$, with coherence Consider the symmetric dictionary and the realizable target Run the Power--Relaxed Greedy Algorithm (PRGA) over $\mathcal{D}$ with initialization $f_0=0$ and residual $r_0=y$. Then the residual cannot

Figures (2)

Figure 1: Minimum residual norm $\min_{1\le m \le M}\|r_m\|_2$ as a function of the coherence $\mu$ for $\alpha=1.1$ and $\alpha=1.5$. Solid lines correspond to the empirical PRGA performance, while dashed lines indicate the theoretical lower bound $b(1-\mu)\sqrt{\frac{1+\mu}{2}}\,P_\alpha$.
Figure :

Theorems & Definitions (9)

Theorem 2.1
proof
Proposition 2.2
proof
Example 2.3: Orthogonal $s$-sparse target
Lemma A.1
proof
Lemma A.2
proof

Step-Size Decay and Structural Stagnation in Greedy Sparse Learning

TL;DR

Abstract

Step-Size Decay and Structural Stagnation in Greedy Sparse Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)