Efficient Continual Finite-Sum Minimization

Ioannis Mavrothalassitis; Stratis Skoulakis; Leello Tadesse Dadi; Volkan Cevher

Efficient Continual Finite-Sum Minimization

Ioannis Mavrothalassitis, Stratis Skoulakis, Leello Tadesse Dadi, Volkan Cevher

TL;DR

Continual finite-sum minimization extends standard finite-sum optimization to prefix objectives $g_i(x)=\frac{1}{i}\sum_{j=1}^i f_j(x)$ and seeks an $\epsilon$-accurate sequence $\hat{x}_i$. The paper introduces CSVRG, a first-order variance-reduction method that uses a sparsified full-gradient direction and a 3-FO gradient estimator to achieve a total FO complexity of $\tilde{O}\left(\frac{n}{\epsilon^{1/3}}+\frac{\log n}{\sqrt{\epsilon}}\right)$ in the strongly convex setting. It provides tight lower bounds for the class of natural first-order methods, analyzes the estimator's unbiasedness and variance, and demonstrates practical gains via ridge regression and MNIST-based experiments. Collectively, these results offer scalable, provably efficient continual optimization methods suitable for streaming data and lifelong learning scenarios.

Abstract

Given a sequence of functions $f_1,\ldots,f_n$ with $f_i:\mathcal{D}\mapsto \mathbb{R}$, finite-sum minimization seeks a point ${x}^\star \in \mathcal{D}$ minimizing $\sum_{j=1}^n f_j(x)/n$. In this work, we propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization, that asks for a sequence of points ${x}_1^\star,\ldots,{x}_n^\star \in \mathcal{D}$ such that each ${x}^\star_i \in \mathcal{D}$ minimizes the prefix-sum $\sum_{j=1}^if_j(x)/i$. Assuming that each prefix-sum is strongly convex, we develop a first-order continual stochastic variance reduction gradient method ($\mathrm{CSVRG}$) producing an $ε$-optimal sequence with $\mathcal{\tilde{O}}(n/ε^{1/3} + 1/\sqrtε)$ overall first-order oracles (FO). An FO corresponds to the computation of a single gradient $\nabla f_j(x)$ at a given $x \in \mathcal{D}$ for some $j \in [n]$. Our approach significantly improves upon the $\mathcal{O}(n/ε)$ FOs that $\mathrm{StochasticGradientDescent}$ requires and the $\mathcal{O}(n^2 \log (1/ε))$ FOs that state-of-the-art variance reduction methods such as $\mathrm{Katyusha}$ require. We also prove that there is no natural first-order method with $\mathcal{O}\left(n/ε^α\right)$ gradient complexity for $α< 1/4$, establishing that the first-order complexity of our method is nearly tight.

Efficient Continual Finite-Sum Minimization

TL;DR

Continual finite-sum minimization extends standard finite-sum optimization to prefix objectives

and seeks an

-accurate sequence

. The paper introduces CSVRG, a first-order variance-reduction method that uses a sparsified full-gradient direction and a 3-FO gradient estimator to achieve a total FO complexity of

in the strongly convex setting. It provides tight lower bounds for the class of natural first-order methods, analyzes the estimator's unbiasedness and variance, and demonstrates practical gains via ridge regression and MNIST-based experiments. Collectively, these results offer scalable, provably efficient continual optimization methods suitable for streaming data and lifelong learning scenarios.

Abstract

Given a sequence of functions

with

, finite-sum minimization seeks a point

minimizing

. In this work, we propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization, that asks for a sequence of points

such that each

minimizes the prefix-sum

. Assuming that each prefix-sum is strongly convex, we develop a first-order continual stochastic variance reduction gradient method (

) producing an

-optimal sequence with

overall first-order oracles (FO). An FO corresponds to the computation of a single gradient

at a given

for some

. Our approach significantly improves upon the

FOs that

requires and the

FOs that state-of-the-art variance reduction methods such as

require. We also prove that there is no natural first-order method with

gradient complexity for

, establishing that the first-order complexity of our method is nearly tight.

Paper Structure (28 sections, 44 theorems, 91 equations, 9 tables, 4 algorithms)

This paper contains 28 sections, 44 theorems, 91 equations, 9 tables, 4 algorithms.

Introduction
Related Work
Preliminaries
Our Results
Natural First-Order methods
CSVRG and Convergence Results
Analyzing the Estimator
Experiments
Conclusions
Acknowledgements
Further Related Work
Proof of Lemma \ref{['lemm:distbound2']}
Proof of Lemma \ref{['l:unbias']}
Proof of Lemma \ref{['l:bounded_variance']}
Proof of Lemma \ref{['lemm:convergebound']}
...and 13 more sections

Key Result

Theorem 1

There exists a first-order method, $\mathrm{CSVRG}$ (Algorithm alg:1), for continual finite-sum minimization d:instance-optimal with ${\mathcal{O}}\left(\frac{L^{2/3}G^{2/3}}{\mu} \cdot \frac{n \log n}{\epsilon^{1/3}} +\frac{L^2G}{\mu^{5/2}} \cdot \frac{\log n}{\sqrt{\epsilon}} \right)$ FO complexit

Theorems & Definitions (72)

Remark 1
Definition 1: Strong Convexity
Theorem 1
Theorem 2
Theorem 3
Remark 2
Definition 2
Example 1
Definition 3
Remark 3
...and 62 more

Efficient Continual Finite-Sum Minimization

TL;DR

Abstract

Efficient Continual Finite-Sum Minimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (72)