Table of Contents
Fetching ...

Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance

Yunchuan Guan, Yu Liu, Ke Zhou, Hui Li, Sen Jia, Zhiqi Shen, Ziyang Wang, Xinglin Zhang, Tao Chen, Jenq-Neng Hwang, Lei Li

TL;DR

This work reframes weight generation as optimization policy learning and introduces Lo-Hp, a decoupled two-stage framework comprising weight preparation from offline optimizers and policy learning via Hybrid-Policy Sub-Trajectory Balance. By injecting offline sub-trajectory supervision into online trajectory generation, Lo-Hp captures local optimization policies while still promoting globally optimal weights, addressing both over-coupling and long-horizon inefficiencies. The authors provide theoretical bounds and demonstrate convergence improvements with Sharpness-Aware Minimization, and empirically show superior accuracy and inference speed across transfer learning, few-shot learning, domain generalization, and large-language-model adaptation. The results indicate substantial practical benefits for scenarios needing frequent weight updates, with notable latency reductions and robust generalization.

Abstract

Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization. However, current methods are limited by issues of over-coupling and long-horizon. The former tightly binds weight generation with task-specific objectives, thereby limiting the flexibility of the learned optimizer. The latter leads to inefficiency and low accuracy during inference, caused by the lack of local constraints. In this paper, we propose Lo-Hp, a decoupled two-stage weight generation framework that enhances flexibility through learning various optimization policies. It adopts a hybrid-policy sub-trajectory balance objective, which integrates on-policy and off-policy learning to capture local optimization policies. Theoretically, we demonstrate that learning solely local optimization policies can address the long-horizon issue while enhancing the generation of global optimal weights. In addition, we validate Lo-Hp's superior accuracy and inference efficiency in tasks that require frequent weight updates, such as transfer learning, few-shot learning, domain generalization, and large language model adaptation.

Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance

TL;DR

This work reframes weight generation as optimization policy learning and introduces Lo-Hp, a decoupled two-stage framework comprising weight preparation from offline optimizers and policy learning via Hybrid-Policy Sub-Trajectory Balance. By injecting offline sub-trajectory supervision into online trajectory generation, Lo-Hp captures local optimization policies while still promoting globally optimal weights, addressing both over-coupling and long-horizon inefficiencies. The authors provide theoretical bounds and demonstrate convergence improvements with Sharpness-Aware Minimization, and empirically show superior accuracy and inference speed across transfer learning, few-shot learning, domain generalization, and large-language-model adaptation. The results indicate substantial practical benefits for scenarios needing frequent weight updates, with notable latency reductions and robust generalization.

Abstract

Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization. However, current methods are limited by issues of over-coupling and long-horizon. The former tightly binds weight generation with task-specific objectives, thereby limiting the flexibility of the learned optimizer. The latter leads to inefficiency and low accuracy during inference, caused by the lack of local constraints. In this paper, we propose Lo-Hp, a decoupled two-stage weight generation framework that enhances flexibility through learning various optimization policies. It adopts a hybrid-policy sub-trajectory balance objective, which integrates on-policy and off-policy learning to capture local optimization policies. Theoretically, we demonstrate that learning solely local optimization policies can address the long-horizon issue while enhancing the generation of global optimal weights. In addition, we validate Lo-Hp's superior accuracy and inference efficiency in tasks that require frequent weight updates, such as transfer learning, few-shot learning, domain generalization, and large language model adaptation.

Paper Structure

This paper contains 23 sections, 3 theorems, 13 equations, 5 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Suppose that $\mathcal{L}_{hy}^{sub} = 0$. Then, the expected cumulative probability of the sub-inference trajectories $\tau^{m':n'}_{on}=\{s_{m'},\cdots, s_{n'}\}$ satisfies

Figures (5)

  • Figure 1: Inference trajectory of generative models in CIFAR-10's 2D weight-reduced space. Darker regions indicate lower downstream task loss, and the red trajectory represents the ground truth generated by the real-world optimizer SGD.
  • Figure 2: Overview of Lo-Hp. It consists of two decoupled stages: weight preparation and policy learning. In the weight preparation stage, it utilizes learned optimizers such as Adam, SGD, etc., to update the neural network weights $\theta$. Then, it samples and records the offline sub-trajectory $\tau^{m:n}_{off}$. In the policy learning stage, the generative model $f^{G}_{\phi}$ adopts a Gaussian policy to generate the online trajectory. A uniform sub-trajectory matching strategy is used to align the online sub-trajectory $\tau^{m':n'}_{on}$ and offline sub-trajectory $\tau^{m:n}_{off}$, and the proposed hybrid-policy sub-trajectory balance is applied to learn local optimization policies.
  • Figure 3: Similarity statistics between generated online sub-trajectories and target offline sub-trajectories on CIFAR-10.
  • Figure 4: The impact of different offline optimization policies on Lo-Hp's inference curve.
  • Figure 5: The impact of SAM on Lo-Hp's learning curve.

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Theorem 3