Table of Contents
Fetching ...

How Free is Parameter-Free Stochastic Optimization?

Amit Attia, Tomer Koren

TL;DR

A lower bound is established that renders fully parameter-free stochastic convex optimization infeasible, and a method is provided which is (partially) parameter-free up to the limit indicated by the lower bound.

Abstract

We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. Existing parameter-free methods can only be considered ``partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters, such as a bound on the stochastic gradient norms, a bound on the distance to a minimizer, etc. In the non-convex setting, we demonstrate that a simple hyperparameter search technique results in a fully parameter-free method that outperforms more sophisticated state-of-the-art algorithms. We also provide a similar result in the convex setting with access to noisy function values under mild noise assumptions. Finally, assuming only access to stochastic gradients, we establish a lower bound that renders fully parameter-free stochastic convex optimization infeasible, and provide a method which is (partially) parameter-free up to the limit indicated by our lower bound.

How Free is Parameter-Free Stochastic Optimization?

TL;DR

A lower bound is established that renders fully parameter-free stochastic convex optimization infeasible, and a method is provided which is (partially) parameter-free up to the limit indicated by the lower bound.

Abstract

We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. Existing parameter-free methods can only be considered ``partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters, such as a bound on the stochastic gradient norms, a bound on the distance to a minimizer, etc. In the non-convex setting, we demonstrate that a simple hyperparameter search technique results in a fully parameter-free method that outperforms more sophisticated state-of-the-art algorithms. We also provide a similar result in the convex setting with access to noisy function values under mild noise assumptions. Finally, assuming only access to stochastic gradients, we establish a lower bound that renders fully parameter-free stochastic convex optimization infeasible, and provide a method which is (partially) parameter-free up to the limit indicated by our lower bound.
Paper Structure (39 sections, 23 theorems, 137 equations, 1 table)

This paper contains 39 sections, 23 theorems, 137 equations, 1 table.

Key Result

theorem 1

Assume that $f$ is $\beta_{\star}$-smooth and lower bounded by some $f^\star$ and $\widetilde{g}$ is a $\sigma_{\star}$-bounded unbiased gradient oracle of $f$. Let $\eta_{\min},\eta_{\max} > 0$ such that where $F_{\star} = f(w_1)-f^\star$. Then for any $\delta \in (0,\tfrac{1}{3})$, given $w_1$, $T$, $\delta$, $\eta_{\min}$ and $\eta_{\max}$, alg:non-convex performs $T$ gradient queries and prod

Theorems & Definitions (42)

  • theorem 1
  • lemma 1: SGD convergence with high probability
  • lemma 2
  • proof : Proof of \ref{['thm:non-convex']}
  • theorem 2
  • lemma 3
  • lemma 4
  • proof : Proof of \ref{['thm:convex-zero-order']}
  • theorem 3
  • theorem 4
  • ...and 32 more