Table of Contents
Fetching ...

CPT: Efficient Deep Neural Network Training via Cyclic Precision

Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Celine Lin

TL;DR

The paper tackles the high cost of training deep neural networks by introducing Cyclic Precision Training (CPT), a dynamic scheme that cyclically varies the bit-width of weights and activations to balance exploration (low precision) and convergence (high precision). CPT uses a cosine-based cycle with bounds automatically identified by a lightweight Precision Range Test, and applies precision cycling primarily to forward computations while keeping gradients at a stable precision for stability. Empirical results across five datasets and eleven models show CPT consistently reduces training BitOPs and latency while achieving comparable or improved accuracy, including gains on ImageNet and perplexity improvements on language models. These findings suggest dynamic, cyclic precision is a practical and effective knob for simultaneous optimization and efficiency in DNN training, with potential for hardware-software co-design to support fast, energy-efficient training.

Abstract

Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs' training time/energy efficiency. In this paper, we attempt to explore low-precision training from a new perspective as inspired by recent findings in understanding DNN training: we conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training. Specifically, we propose Cyclic Precision Training (CPT) to cyclically vary the precision between two boundary values which can be identified using a simple precision range test within the first few training epochs. Extensive simulations and ablation studies on five datasets and eleven models demonstrate that CPT's effectiveness is consistent across various models/tasks (including classification and language modeling). Furthermore, through experiments and visualization we show that CPT helps to (1) converge to a wider minima with a lower generalization error and (2) reduce training variance which we believe opens up a new design knob for simultaneously improving the optimization and efficiency of DNN training. Our codes are available at: https://github.com/RICE-EIC/CPT.

CPT: Efficient Deep Neural Network Training via Cyclic Precision

TL;DR

The paper tackles the high cost of training deep neural networks by introducing Cyclic Precision Training (CPT), a dynamic scheme that cyclically varies the bit-width of weights and activations to balance exploration (low precision) and convergence (high precision). CPT uses a cosine-based cycle with bounds automatically identified by a lightweight Precision Range Test, and applies precision cycling primarily to forward computations while keeping gradients at a stable precision for stability. Empirical results across five datasets and eleven models show CPT consistently reduces training BitOPs and latency while achieving comparable or improved accuracy, including gains on ImageNet and perplexity improvements on language models. These findings suggest dynamic, cyclic precision is a practical and effective knob for simultaneous optimization and efficiency in DNN training, with potential for hardware-software co-design to support fast, energy-efficient training.

Abstract

Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs' training time/energy efficiency. In this paper, we attempt to explore low-precision training from a new perspective as inspired by recent findings in understanding DNN training: we conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training, and advocate dynamic precision along the training trajectory for further boosting the time/energy efficiency of DNN training. Specifically, we propose Cyclic Precision Training (CPT) to cyclically vary the precision between two boundary values which can be identified using a simple precision range test within the first few training epochs. Extensive simulations and ablation studies on five datasets and eleven models demonstrate that CPT's effectiveness is consistent across various models/tasks (including classification and language modeling). Furthermore, through experiments and visualization we show that CPT helps to (1) converge to a wider minima with a lower generalization error and (2) reduce training variance which we believe opens up a new design knob for simultaneously improving the optimization and efficiency of DNN training. Our codes are available at: https://github.com/RICE-EIC/CPT.

Paper Structure

This paper contains 15 sections, 1 equation, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Test accuracy evolution of ResNet-74 on CIFAR-100 under different schedules.
  • Figure 2: Loss landscape visualization after convergence of ResNet-74 on CIFAR-100 trained with different precision schedules, where wider contours with larger intervals indicate a better local minima and a lower generalization error as analyzed in li2018visualizing.
  • Figure 3: Static vs. Cyclic Precision Training (CPT), where CPT cyclically schedules the precision of weights and activations during training.
  • Figure 4: Illustrating the precision range test for ResNet-152 and MobileNetV2 on CIFAR-100, where the switching point which exceeds the preset threshold is denoted by red circles.
  • Figure 5: Test accuracy vs. the required GBitOPs when training ResNet-38/74/110/152/164 and MobileNetV2 on CIFAR-100 using static precision, static precision plus CLR, and CPT methods.
  • ...and 4 more figures