Table of Contents
Fetching ...

POP: Prior-fitted Optimizer Policies

Jan Kobiolka, Christian Frey, Gresa Shala, Arlind Kadra, Erind Bedalli, Josif Grabocka

TL;DR

POP (Prior-fitted Optimizer Policies), a meta-learned optimizer that predicts coordinate-wise step sizes conditioned on the contextual information provided in the optimization trajectory, demonstrates strong generalization capabilities without task-specific tuning.

Abstract

Optimization refers to the task of finding extrema of an objective function. Classical gradient-based optimizers are highly sensitive to hyperparameter choices. In highly non-convex settings their performance relies on carefully tuned learning rates, momentum, and gradient accumulation. To address these limitations, we introduce POP (Prior-fitted Optimizer Policies), a meta-learned optimizer that predicts coordinate-wise step sizes conditioned on the contextual information provided in the optimization trajectory. Our model is learned on millions of synthetic optimization problems sampled from a novel prior spanning both convex and non-convex objectives. We evaluate POP on an established benchmark including 47 optimization functions of various complexity, where it consistently outperforms first-order gradient-based methods, non-convex optimization approaches (e.g., evolutionary strategies), Bayesian optimization, and a recent meta-learned competitor under matched budget constraints. Our evaluation demonstrates strong generalization capabilities without task-specific tuning.

POP: Prior-fitted Optimizer Policies

TL;DR

POP (Prior-fitted Optimizer Policies), a meta-learned optimizer that predicts coordinate-wise step sizes conditioned on the contextual information provided in the optimization trajectory, demonstrates strong generalization capabilities without task-specific tuning.

Abstract

Optimization refers to the task of finding extrema of an objective function. Classical gradient-based optimizers are highly sensitive to hyperparameter choices. In highly non-convex settings their performance relies on carefully tuned learning rates, momentum, and gradient accumulation. To address these limitations, we introduce POP (Prior-fitted Optimizer Policies), a meta-learned optimizer that predicts coordinate-wise step sizes conditioned on the contextual information provided in the optimization trajectory. Our model is learned on millions of synthetic optimization problems sampled from a novel prior spanning both convex and non-convex objectives. We evaluate POP on an established benchmark including 47 optimization functions of various complexity, where it consistently outperforms first-order gradient-based methods, non-convex optimization approaches (e.g., evolutionary strategies), Bayesian optimization, and a recent meta-learned competitor under matched budget constraints. Our evaluation demonstrates strong generalization capabilities without task-specific tuning.
Paper Structure (30 sections, 29 equations, 17 figures, 6 tables)

This paper contains 30 sections, 29 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: POP (blue) adapts its learning rate based on the optimization landscape, enabling rapid convergence, escape from local minima, and improved global optimization compared to Adam (red). Yellow diamond represents the global minima, the white cross represents the start state, and the square represents the end state.
  • Figure 2: Meta-learning reward and evaluation validation performance of our POP agent.
  • Figure 3: In-distribution test set performance vs. baselines. Mean normalized improvement over steps; shading indicates 95% CIs. Dashed line marks the context/optimization boundary.
  • Figure 4: Method rankings on the in-distribution test set at 100% budget. Lower ranks correspond to better performance, while horizontal bars indicate differences that are not statistically significant.
  • Figure 5: In-distribution test set performance vs. baselines at twice the training budget. Mean normalized improvement over steps; shading indicates 95% CIs. Dashed line marks the context/optimization boundary.
  • ...and 12 more figures

Theorems & Definitions (1)

  • Definition 2.1: Optimization trajectory