Strong convexity-guided hyper-parameter optimization for flatter losses
Rahul Yedida, Snehanshu Saha
TL;DR
This paper tackles hyper-parameter optimization by linking loss-flatness to generalization through strong convexity. It proposes AHSC, a white-box HPO method that minimizes the strong convexity measure inferred from mini-batch Hessian information after a short initial training, pruning poor configurations before full training. The approach yields competitive performance across 14 datasets while substantially reducing runtime compared to traditional HPO methods, and it provides a theoretical connection between flatness and strong convexity that underpins the pruning step. The work includes practical algorithmic details, empirical validation, and public code, offering a scalable path to faster, landscape-aware hyper-parameter tuning.
Abstract
We propose a novel white-box approach to hyper-parameter optimization. Motivated by recent work establishing a relationship between flat minima and generalization, we first establish a relationship between the strong convexity of the loss and its flatness. Based on this, we seek to find hyper-parameter configurations that improve flatness by minimizing the strong convexity of the loss. By using the structure of the underlying neural network, we derive closed-form equations to approximate the strong convexity parameter, and attempt to find hyper-parameters that minimize it in a randomized fashion. Through experiments on 14 classification datasets, we show that our method achieves strong performance at a fraction of the runtime.
