Oblivious Stochastic Composite Optimization

Clément Lezane; Alexandre d'Aspremont

Oblivious Stochastic Composite Optimization

Clément Lezane, Alexandre d'Aspremont

TL;DR

This work tackles stochastic convex optimization when regularity parameters and diameter bounds are unknown. It introduces a unified Composite Mirror Descent framework with Oblivious Step-sizes that blends mirror descent with dual averaging, using polynomial step-sizes $\alpha_t \propto t^n$, $\beta_t \propto \mu t^n$, and $\gamma_t \propto \mu t^{n+1}/(n+1)$ to achieve convergence without prior knowledge. The authors develop three oblivious algorithms for relative-scale, standard smooth, and relatively-smooth regimes, extend the framework to large-scale semidefinite programs via stochastic smoothing and power-method oracles, and provide a Grid Search adaptation to select regularization parameters adaptively. Theoretical guarantees accompany practical demonstrations, showing robust performance and scalability, with empirical results in SDP-like problems and high-dimensional settings. Overall, the paper delivers parameter-free, broadly applicable stochastic optimization methods with strong convergence guarantees and tangible impact for large-scale SDP applications.

Abstract

In stochastic convex optimization problems, most existing adaptive methods rely on prior knowledge about the diameter bound $D$ when the smoothness or the Lipschitz constant is unknown. This often significantly affects performance as only a rough approximation of $D$ is usually known in practice. Here, we bypass this limitation by combining mirror descent with dual averaging techniques and we show that, under oblivious step-sizes regime, our algorithms converge without any prior knowledge on the parameters of the problem. We introduce three oblivious stochastic algorithms to address different settings. The first algorithm is designed for objectives in relative scale, the second one is an accelerated version tailored for smooth objectives, whereas the last one is for relatively-smooth objectives. All three algorithms work without prior knowledge of the diameter of the feasible set, the Lipschitz constant or smoothness of the objective function. We use these results to revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large-scale semidefinite programs.

Oblivious Stochastic Composite Optimization

TL;DR

, and

to achieve convergence without prior knowledge. The authors develop three oblivious algorithms for relative-scale, standard smooth, and relatively-smooth regimes, extend the framework to large-scale semidefinite programs via stochastic smoothing and power-method oracles, and provide a Grid Search adaptation to select regularization parameters adaptively. Theoretical guarantees accompany practical demonstrations, showing robust performance and scalability, with empirical results in SDP-like problems and high-dimensional settings. Overall, the paper delivers parameter-free, broadly applicable stochastic optimization methods with strong convergence guarantees and tangible impact for large-scale SDP applications.

Abstract

In stochastic convex optimization problems, most existing adaptive methods rely on prior knowledge about the diameter bound

when the smoothness or the Lipschitz constant is unknown. This often significantly affects performance as only a rough approximation of

is usually known in practice. Here, we bypass this limitation by combining mirror descent with dual averaging techniques and we show that, under oblivious step-sizes regime, our algorithms converge without any prior knowledge on the parameters of the problem. We introduce three oblivious stochastic algorithms to address different settings. The first algorithm is designed for objectives in relative scale, the second one is an accelerated version tailored for smooth objectives, whereas the last one is for relatively-smooth objectives. All three algorithms work without prior knowledge of the diameter of the feasible set, the Lipschitz constant or smoothness of the objective function. We use these results to revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large-scale semidefinite programs.

Paper Structure (22 sections, 14 theorems, 144 equations, 2 figures, 5 tables, 4 algorithms)

This paper contains 22 sections, 14 theorems, 144 equations, 2 figures, 5 tables, 4 algorithms.

Introduction
Problem settings
Related work
Polyak step-size and Line search.
Grid search.
Adagrad and online algorithms.
Our contributions
Composite Mirror Descent with Oblivious Step-sizes.
Applications Across Different Problem Classes.
Practical Impact and Theoretical Foundations.
Applications to semidefinite programming
Preliminaries
Useful Lemmas
Composite Mirror Descent
Relative scale setting
...and 7 more sections

Key Result

Lemma 2.1

Let $f$ be a convex function and $\nu$ continuously differentiable. If we consider with the Bregman divergence $D^{\nu}(a,b) := \nu(a) - \nu(b) -\langle \nabla \nu (b),a-b\rangle$ then for all $u$,

Figures (2)

Figure 1: Performance on synthetic data, using stochastic smoothing in normal/sparse cases. Upper figures show performance under theoretical step-sizes, bottom figures use hyper-tuned step-sizes.
Figure 2: Performance of algorithms on synthetic data, using power method in the normal case and in the sparse case. The upper figures show performances when we use the parameters $L_\star$ and $D$ provided by the theory, the bottom one are using hyper-tuned step-sizes.

Theorems & Definitions (28)

Lemma 2.1
proof
Lemma 2.2
proof
Lemma 2.3
proof
Theorem 3.1
proof
Corollary 3.1
proof
...and 18 more

Oblivious Stochastic Composite Optimization

TL;DR

Abstract

Oblivious Stochastic Composite Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (28)