Accelerating Learned Image Compression Through Modeling Neural Training Dynamics
Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu
TL;DR
The paper tackles the high training cost of learned image compression (LIC) by modeling neural training dynamics in a low-dimensional modal space using Correlation Mode Decomposition (CMD). It introduces Sensitivity-aware True and Dummy Embedding Training (STDET) to represent most parameters as affine transformations of a small set of reference trajectories and employs Sampling-then-Moving Average (SMA) to stabilize training and reduce variance. Theoretical analysis on a noisy quadratic model shows the proposed method achieves lower steady-state variance than standard SGD, and experiments across multiple LICs demonstrate substantial reductions in training time (roughly 60% of SGD) and parameter counts with comparable or improved rate–distortion performance. Overall, the approach enables faster LIC development and training efficiency, with potential applicability to learned video compression as future work.
Abstract
As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.
