Table of Contents
Fetching ...

Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization

Hongjun Choi, Jayaraman J. Thiagarajan, Ruben Glatt, Shusen Liu

TL;DR

This work investigates how neural representations can parameterize pre-trained CNN weights to improve both accuracy and predictor-parameter efficiency. It reveals that a reconstruction-only objective can recover or exceed original accuracy and introduces a two-phase, decoupled training scheme (reconstruction followed by distillation) plus an inception-like progressive reconstruction to boost performance and compression. The use of high-capacity teachers during the distillation phase further enhances the compression–accuracy trade-off, enabling significant predictor-size reductions while maintaining or surpassing baseline performance across multiple datasets. These findings offer a practical pathway to deploying weight-predictor schemes that simultaneously enhance model quality and storage efficiency, with potential extensions to diverse architectures and data-free settings.

Abstract

In this work, we investigate the fundamental trade-off regarding accuracy and parameter efficiency in the parameterization of neural network weights using predictor networks. We present a surprising finding that, when recovering the original model accuracy is the sole objective, it can be achieved effectively through the weight reconstruction objective alone. Additionally, we explore the underlying factors for improving weight reconstruction under parameter-efficiency constraints, and propose a novel training scheme that decouples the reconstruction objective from auxiliary objectives such as knowledge distillation that leads to significant improvements compared to state-of-the-art approaches. Finally, these results pave way for more practical scenarios, where one needs to achieve improvements on both model accuracy and predictor network parameter-efficiency simultaneously.

Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization

TL;DR

This work investigates how neural representations can parameterize pre-trained CNN weights to improve both accuracy and predictor-parameter efficiency. It reveals that a reconstruction-only objective can recover or exceed original accuracy and introduces a two-phase, decoupled training scheme (reconstruction followed by distillation) plus an inception-like progressive reconstruction to boost performance and compression. The use of high-capacity teachers during the distillation phase further enhances the compression–accuracy trade-off, enabling significant predictor-size reductions while maintaining or surpassing baseline performance across multiple datasets. These findings offer a practical pathway to deploying weight-predictor schemes that simultaneously enhance model quality and storage efficiency, with potential extensions to diverse architectures and data-free settings.

Abstract

In this work, we investigate the fundamental trade-off regarding accuracy and parameter efficiency in the parameterization of neural network weights using predictor networks. We present a surprising finding that, when recovering the original model accuracy is the sole objective, it can be achieved effectively through the weight reconstruction objective alone. Additionally, we explore the underlying factors for improving weight reconstruction under parameter-efficiency constraints, and propose a novel training scheme that decouples the reconstruction objective from auxiliary objectives such as knowledge distillation that leads to significant improvements compared to state-of-the-art approaches. Finally, these results pave way for more practical scenarios, where one needs to achieve improvements on both model accuracy and predictor network parameter-efficiency simultaneously.
Paper Structure (21 sections, 1 equation, 9 figures, 6 tables)

This paper contains 21 sections, 1 equation, 9 figures, 6 tables.

Figures (9)

  • Figure 1: While one expects that the reconstruction error must approach zero to recover the true performance (left), we are able to find networks with non-zero (yet low) errors that not only match the true performance but even surpass it (right).
  • Figure 2: Measuring the dominance of low-frequency components in original and recovered networks.
  • Figure 3: Evaluation of the reconstruction performance for each round of reconstruction.
  • Figure 4: (a) Average weight difference between predicted weights by Recon-only and original weights with varying hidden sizes. (b) Comparison among Recon-only, baseline, and ours. (c) Evaluation of the reconstruction performance for each method. $\uparrow$ represents our performance improvement over the baseline.
  • Figure 5: A summary of the explored training schemes and their benefits. We investigate the impact of reconstruction-only setup in Section \ref{['sec:NeRN_Inception_Training']}, and introduce the decoupled training with separate reconstructed and distillation stages in Section \ref{['sec:NeRN_Decoupled_Training']}, which improve storage compression via the more effective predictor networks. Lastly, we demonstrate that a high-capacity teacher can further facilitate both the compression and accuracy goals in Section \ref{['sec:dual_objective_through_high_teacher']}.
  • ...and 4 more figures