Table of Contents
Fetching ...

Improving Learning to Optimize Using Parameter Symmetries

Guy Zamir, Aryan Dokania, Bo Zhao, Rose Yu

TL;DR

This work investigates learning-to-optimize (L2O) with parameter-space symmetry through teleportation. It shows that teleportation can locally emulate Newton-like (second-order) updates and demonstrates that the symmetry transformation can be learned, supported by a simple 2D example; it also introduces a symmetry-rich benchmark and analyzes both successes and failures, including momentum-enhanced teleportation. The findings highlight the potential of symmetry-aware meta-optimization to accelerate convergence in neural networks with large parameter spaces, while also underscoring the task- distribution dependence of teleportation benefits. Overall, the work bridges symmetry in neural parameter spaces with meta-learning to advance optimization methods.

Abstract

We analyze a learning-to-optimize (L2O) algorithm that exploits parameter space symmetry to enhance optimization efficiency. Prior work has shown that jointly learning symmetry transformations and local updates improves meta-optimizer performance. Supporting this, our theoretical analysis demonstrates that even without identifying the optimal group element, the method locally resembles Newton's method. We further provide an example where the algorithm provably learns the correct symmetry transformation during training. To empirically evaluate L2O with teleportation, we introduce a benchmark, analyze its success and failure cases, and show that enhancements like momentum further improve performance. Our results highlight the potential of leveraging neural network parameter space symmetry to advance meta-optimization.

Improving Learning to Optimize Using Parameter Symmetries

TL;DR

This work investigates learning-to-optimize (L2O) with parameter-space symmetry through teleportation. It shows that teleportation can locally emulate Newton-like (second-order) updates and demonstrates that the symmetry transformation can be learned, supported by a simple 2D example; it also introduces a symmetry-rich benchmark and analyzes both successes and failures, including momentum-enhanced teleportation. The findings highlight the potential of symmetry-aware meta-optimization to accelerate convergence in neural networks with large parameter spaces, while also underscoring the task- distribution dependence of teleportation benefits. Overall, the work bridges symmetry in neural parameter spaces with meta-learning to advance optimization methods.

Abstract

We analyze a learning-to-optimize (L2O) algorithm that exploits parameter space symmetry to enhance optimization efficiency. Prior work has shown that jointly learning symmetry transformations and local updates improves meta-optimizer performance. Supporting this, our theoretical analysis demonstrates that even without identifying the optimal group element, the method locally resembles Newton's method. We further provide an example where the algorithm provably learns the correct symmetry transformation during training. To empirically evaluate L2O with teleportation, we introduce a benchmark, analyze its success and failure cases, and show that enhancements like momentum further improve performance. Our results highlight the potential of leveraging neural network parameter space symmetry to advance meta-optimization.

Paper Structure

This paper contains 22 sections, 2 theorems, 25 equations, 7 figures, 3 algorithms.

Key Result

Proposition 4.1

For convex function $\mathcal{L}$, the directional derivative of $\left\| \frac{\partial \mathcal{L}}{\partial {\bm{w}}} \right\|_2^2$ along the direction of ${\bm{v}}_{\bot}$ is non-negative. That is,

Figures (7)

  • Figure 1: The gradient norm increases along ${\bm{v}}_{\bot}$, the component of Newton's direction (${\bm{v}}_2$) that is orthogonal to the gradient (${\bm{v}}_1$).
  • Figure 2: Update direction of $\theta$ during learning to teleport.
  • Figure 3: Comparison of vanilla L2O with and without learned teleportation for fixed objective ellipse functions.
  • Figure 4: Comparison of vanilla L2O with and without learned teleportation for variable objective ellipse functions.
  • Figure 5: Comparison of vanilla L2O with and without learned teleportation for fixed objective Rosenbrock functions.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Proposition 4.1
  • Lemma A.1
  • proof
  • proof