How Free is Parameter-Free Stochastic Optimization?

Amit Attia; Tomer Koren

How Free is Parameter-Free Stochastic Optimization?

Amit Attia, Tomer Koren

TL;DR

A lower bound is established that renders fully parameter-free stochastic convex optimization infeasible, and a method is provided which is (partially) parameter-free up to the limit indicated by the lower bound.

Abstract

We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. Existing parameter-free methods can only be considered ``partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters, such as a bound on the stochastic gradient norms, a bound on the distance to a minimizer, etc. In the non-convex setting, we demonstrate that a simple hyperparameter search technique results in a fully parameter-free method that outperforms more sophisticated state-of-the-art algorithms. We also provide a similar result in the convex setting with access to noisy function values under mild noise assumptions. Finally, assuming only access to stochastic gradients, we establish a lower bound that renders fully parameter-free stochastic convex optimization infeasible, and provide a method which is (partially) parameter-free up to the limit indicated by our lower bound.

How Free is Parameter-Free Stochastic Optimization?

TL;DR

Abstract

Paper Structure (39 sections, 23 theorems, 137 equations, 1 table)

This paper contains 39 sections, 23 theorems, 137 equations, 1 table.

Introduction
Summary of contributions
Non-convex setting: fully parameter-free algorithm.
Convex setting: fully parameter-free algorithm with noisy function values.
Convex setting: impossibility without function values.
Parameter-free algorithm for the convex non-smooth setting.
Parameter-free algorithm for the convex smooth setting.
Additional related work
Adaptive stochastic non-convex optimization.
Parameter-free and adaptive stochastic convex optimization methods.
Parameter-free deterministic optimization methods.
Parameter-free online convex optimization.
Classical analyses of stochastic gradient descent.
Addendum: concurrent work on parameter-free stochastic optimization.
Preliminaries
...and 24 more sections

Key Result

theorem 1

Assume that $f$ is $\beta_{\star}$-smooth and lower bounded by some $f^\star$ and $\widetilde{g}$ is a $\sigma_{\star}$-bounded unbiased gradient oracle of $f$. Let $\eta_{\min},\eta_{\max} > 0$ such that where $F_{\star} = f(w_1)-f^\star$. Then for any $\delta \in (0,\tfrac{1}{3})$, given $w_1$, $T$, $\delta$, $\eta_{\min}$ and $\eta_{\max}$, alg:non-convex performs $T$ gradient queries and prod

Theorems & Definitions (42)

theorem 1
lemma 1: SGD convergence with high probability
lemma 2
proof : Proof of \ref{['thm:non-convex']}
theorem 2
lemma 3
lemma 4
proof : Proof of \ref{['thm:convex-zero-order']}
theorem 3
theorem 4
...and 32 more

How Free is Parameter-Free Stochastic Optimization?

TL;DR

Abstract

How Free is Parameter-Free Stochastic Optimization?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (42)