Table of Contents
Fetching ...

Accelerating Learned Image Compression Through Modeling Neural Training Dynamics

Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

TL;DR

The paper tackles the high training cost of learned image compression (LIC) by modeling neural training dynamics in a low-dimensional modal space using Correlation Mode Decomposition (CMD). It introduces Sensitivity-aware True and Dummy Embedding Training (STDET) to represent most parameters as affine transformations of a small set of reference trajectories and employs Sampling-then-Moving Average (SMA) to stabilize training and reduce variance. Theoretical analysis on a noisy quadratic model shows the proposed method achieves lower steady-state variance than standard SGD, and experiments across multiple LICs demonstrate substantial reductions in training time (roughly 60% of SGD) and parameter counts with comparable or improved rate–distortion performance. Overall, the approach enables faster LIC development and training efficiency, with potential applicability to learned video compression as future work.

Abstract

As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.

Accelerating Learned Image Compression Through Modeling Neural Training Dynamics

TL;DR

The paper tackles the high training cost of learned image compression (LIC) by modeling neural training dynamics in a low-dimensional modal space using Correlation Mode Decomposition (CMD). It introduces Sensitivity-aware True and Dummy Embedding Training (STDET) to represent most parameters as affine transformations of a small set of reference trajectories and employs Sampling-then-Moving Average (SMA) to stabilize training and reduce variance. Theoretical analysis on a noisy quadratic model shows the proposed method achieves lower steady-state variance than standard SGD, and experiments across multiple LICs demonstrate substantial reductions in training time (roughly 60% of SGD) and parameter counts with comparable or improved rate–distortion performance. Overall, the approach enables faster LIC development and training efficiency, with potential applicability to learned video compression as future work.

Abstract

As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.

Paper Structure

This paper contains 38 sections, 35 equations, 16 figures, 13 tables, 1 algorithm.

Figures (16)

  • Figure 1: ELIC he2022elic model, $\lambda = 0.0018$. (a) Clustered correlation matrix of sampled 10k parameter trajectories trained on COCO2017 dataset lin2014microsoft, decomposed to 10 modes. The diagonal block structure indicates high correlations of the parameters within each mode, which shows the accurate representation of the proposed method. (b) The table shows the percentage of affine coefficients $\{k_i\}_{i=1}^N$ relative to the total number of coefficients, grouped by relative change intervals, at different epochs during the training of the ELIC model. These relative changes are measured against the final coefficient values at epoch 120. The rows correspond to the relative change intervals (0% - 1%, 1% - 2%, ..., 50% - 100%), indicating how much the coefficients have relatively changed compared to their values at epoch 120. The columns represent specific epochs (0, 20, 40, 60, 80, 100). The percentages indicate the proportion of coefficients that fall within each relative change interval at the corresponding epoch. The table reveals that most coefficients either remain stable or undergo only minor changes as training progresses from epoch 20 to 120. Notably, the proportion of coefficients in the 0% to 1% interval increases significantly from 31.64% at epoch 20 to 81.70% at epoch 100, indicating a marked stabilization of the affine coefficients over time.
  • Figure 2: Testing loss comparison of various methods. Please zoom in for more details. The proposed method clearly converges much faster than standard SGD on various LICs. Additionally, as shown in the upper right corner, our method achieves a similar final convergence compared to SGD. $\lambda = 0.0018$, Testing R-D loss = $\lambda \cdot 255^2 \cdot \text{MSE} + \text{Bpp}$. Evaluated on the Kodak dataset.
  • Figure 3: A typical pipeline in learned image compression.
  • Figure 4: R-D curves of various methods. Please zoom in for more details.
  • Figure 5: Ablation experiments on proposed methods. ELIC model.
  • ...and 11 more figures