Table of Contents
Fetching ...

Low-Resolution Neural Networks

Eduardo Lobo Lustosa Cabral, Larissa Driemeier

TL;DR

The paper investigates the memory-efficiency vs. accuracy trade-off of low-bitweight quantization across fully connected, convolutional, and Vision Transformer architectures on CIFAR-10. It introduces a quantization scheme restricting weights to $N_{values}$ discrete levels, while keeping biases at 32-bit, and uses a VQ-VAE–inspired method to maintain learnability during training. Across FCNN, CVNN, and VIT, results show that 2.32-bit weights ($N_{values}=5$) deliver the best balance of memory reduction (~12x) and performance, with very low-bit models requiring more epochs and sometimes exhibiting instability, particularly under data augmentation. The findings suggest a potential path to deploying larger models on memory-constrained devices, contingent on future hardware advances and validation on broader datasets and larger parameter counts.

Abstract

The expanding scale of large neural network models introduces significant challenges, driving efforts to reduce memory usage and enhance computational efficiency. Such measures are crucial to ensure the practical implementation and effective application of these sophisticated models across a wide array of use cases. This study examines the impact of parameter bit precision on model performance compared to standard 32-bit models, with a focus on multiclass object classification in images. The models analyzed include those with fully connected layers, convolutional layers, and transformer blocks, with model weight resolution ranging from 1 bit to 4.08 bits. The findings indicate that models with lower parameter bit precision achieve results comparable to 32-bit models, showing promise for use in memory-constrained devices. While low-resolution models with a small number of parameters require more training epochs to achieve accuracy comparable to 32-bit models, those with a large number of parameters achieve similar performance within the same number of epochs. Additionally, data augmentation can destabilize training in low-resolution models, but including zero as a potential value in the weight parameters helps maintain stability and prevents performance degradation. Overall, 2.32-bit weights offer the optimal balance of memory reduction, performance, and efficiency. However, further research should explore other dataset types and more complex and larger models. These findings suggest a potential new era for optimized neural network models with reduced memory requirements and improved computational efficiency, though advancements in dedicated hardware are necessary to fully realize this potential.

Low-Resolution Neural Networks

TL;DR

The paper investigates the memory-efficiency vs. accuracy trade-off of low-bitweight quantization across fully connected, convolutional, and Vision Transformer architectures on CIFAR-10. It introduces a quantization scheme restricting weights to discrete levels, while keeping biases at 32-bit, and uses a VQ-VAE–inspired method to maintain learnability during training. Across FCNN, CVNN, and VIT, results show that 2.32-bit weights () deliver the best balance of memory reduction (~12x) and performance, with very low-bit models requiring more epochs and sometimes exhibiting instability, particularly under data augmentation. The findings suggest a potential path to deploying larger models on memory-constrained devices, contingent on future hardware advances and validation on broader datasets and larger parameter counts.

Abstract

The expanding scale of large neural network models introduces significant challenges, driving efforts to reduce memory usage and enhance computational efficiency. Such measures are crucial to ensure the practical implementation and effective application of these sophisticated models across a wide array of use cases. This study examines the impact of parameter bit precision on model performance compared to standard 32-bit models, with a focus on multiclass object classification in images. The models analyzed include those with fully connected layers, convolutional layers, and transformer blocks, with model weight resolution ranging from 1 bit to 4.08 bits. The findings indicate that models with lower parameter bit precision achieve results comparable to 32-bit models, showing promise for use in memory-constrained devices. While low-resolution models with a small number of parameters require more training epochs to achieve accuracy comparable to 32-bit models, those with a large number of parameters achieve similar performance within the same number of epochs. Additionally, data augmentation can destabilize training in low-resolution models, but including zero as a potential value in the weight parameters helps maintain stability and prevents performance degradation. Overall, 2.32-bit weights offer the optimal balance of memory reduction, performance, and efficiency. However, further research should explore other dataset types and more complex and larger models. These findings suggest a potential new era for optimized neural network models with reduced memory requirements and improved computational efficiency, though advancements in dedicated hardware are necessary to fully realize this potential.

Paper Structure

This paper contains 14 sections, 5 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Training and validation results for the simpler models with only fully connected layers (FCNN1) for various resolutions used in the weights.
  • Figure 2: Training and validation results for the more complex models with only fully connected layers (FCNN2) for various resolutions used in the weights.
  • Figure 3: Training and validation results for the simpler models with only fully connected layers (FCNN1) using data augmentation for various resolutions used in the weights.
  • Figure 4: Training and validation results for the more complex models with only fully connected layers (FCNN2) using data augmentation for various resolutions used in the weights.
  • Figure 5: Training and validation results for the simpler models with Convolutional layer models (CVNN1) for various resolutions used in the weights.
  • ...and 8 more figures