Table of Contents
Fetching ...

Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation

Hanqiu Chen, Hang Yang, Stephen Fitzmeyer, Cong Hao

TL;DR

Rapid-INR tackles data-transfer bottlenecks in on-device CNN training by encoding entire image datasets as Implicit Neural Representation (INR) weights stored on GPUs, enabling end-to-end GPU training with on-the-fly decoding. The approach combines an encoder-decoder INR pipeline with iterative/dynamic pruning and layer-wise quantization to compress data further, achieving up to 6x speedups over PyTorch and 1.2x over DALI while using roughly 5% of the original RGB storage and incurring only modest accuracy loss. Evaluations on ResNet-18 across CIFAR-10, 102flowers, and Mini-ImageNet show INR-based decoding maintains competitive $PSNR$-driven reconstruction quality and backbone training accuracy compared to JPEG at equivalent memory, with the method demonstrated as generalizable to other CV tasks and backbones. Code for Rapid-INR is publicly available, highlighting practical appeal for efficient, CPU-free training workflows.

Abstract

Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure. Instead, INR represents objects as continuous functions. Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG. However, INR holds potential for various applications beyond image compression. This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks. Our methodology involves storing the whole dataset directly in INR format on a GPU, mitigating the significant data communication overhead between the CPU and GPU during training. Additionally, the decoding process from INR to RGB format is highly parallelized and executed on-the-fly. To further enhance compression, we propose iterative and dynamic pruning, as well as layer-wise quantization, building upon previous work. We evaluate our framework on the image classification task, utilizing the ResNet-18 backbone network and three commonly used datasets with varying image sizes. Rapid-INR reduces memory consumption to only about 5% of the original dataset size in RGB format and achieves a maximum 6$\times$ speedup over the PyTorch training pipeline, as well as a maximum 1.2x speedup over the DALI training pipeline, with only a marginal decrease in accuracy. Importantly, Rapid-INR can be readily applied to other computer vision tasks and backbone networks with reasonable engineering efforts. Our implementation code is publicly available at https://github.com/sharc-lab/Rapid-INR.

Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation

TL;DR

Rapid-INR tackles data-transfer bottlenecks in on-device CNN training by encoding entire image datasets as Implicit Neural Representation (INR) weights stored on GPUs, enabling end-to-end GPU training with on-the-fly decoding. The approach combines an encoder-decoder INR pipeline with iterative/dynamic pruning and layer-wise quantization to compress data further, achieving up to 6x speedups over PyTorch and 1.2x over DALI while using roughly 5% of the original RGB storage and incurring only modest accuracy loss. Evaluations on ResNet-18 across CIFAR-10, 102flowers, and Mini-ImageNet show INR-based decoding maintains competitive -driven reconstruction quality and backbone training accuracy compared to JPEG at equivalent memory, with the method demonstrated as generalizable to other CV tasks and backbones. Code for Rapid-INR is publicly available, highlighting practical appeal for efficient, CPU-free training workflows.

Abstract

Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure. Instead, INR represents objects as continuous functions. Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG. However, INR holds potential for various applications beyond image compression. This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks. Our methodology involves storing the whole dataset directly in INR format on a GPU, mitigating the significant data communication overhead between the CPU and GPU during training. Additionally, the decoding process from INR to RGB format is highly parallelized and executed on-the-fly. To further enhance compression, we propose iterative and dynamic pruning, as well as layer-wise quantization, building upon previous work. We evaluate our framework on the image classification task, utilizing the ResNet-18 backbone network and three commonly used datasets with varying image sizes. Rapid-INR reduces memory consumption to only about 5% of the original dataset size in RGB format and achieves a maximum 6 speedup over the PyTorch training pipeline, as well as a maximum 1.2x speedup over the DALI training pipeline, with only a marginal decrease in accuracy. Importantly, Rapid-INR can be readily applied to other computer vision tasks and backbone networks with reasonable engineering efforts. Our implementation code is publicly available at https://github.com/sharc-lab/Rapid-INR.
Paper Structure (23 sections, 4 equations, 8 figures)

This paper contains 23 sections, 4 equations, 8 figures.

Figures (8)

  • Figure 1: A high-level overview of three different training pipelines. PyTorch pipeline: Keep fetching batches of JPEG images from the disk to the CPU and decode to RGB format. Then do resizing and augmentation to get prepared for training. DALI pipeline: Similar to PyTorch pipeline. The difference is that the decoding is in hybrid mode using CPU and GPU together for acceleration. INR pipeline: Only transfer the whole dataset in INR MLP weights format from disk to GPU one time before training starts. Then decode the images to RGB format on-the-fly.
  • Figure 2: The encoder-decoder architecture of Rapid-INR. In the INR encoding part, each image is encoded to INR weights using a separate MLP. The encoding part is offline, and we store the whole dataset in INR weights format on the disk. In the decoding part, the whole dataset INR weights will be transferred to CUDA memory first, and then decode one batch needed for backbone training on-the-fly when training starts.
  • Figure 3: The relationship between PSNR and average image size using different compression techniques to INR and JPEG. (Note: The average pruning ratio for CIFAR-10 dataset is 15%, for 102flowers and Mini-ImageNet is 25%. 16-bit and 8-bit INR combines quantization and pruning together.
  • Figure 4: The normalized weight distribution of three different types of layers. First layer and last layer weight do not distribute around 0 and have a relatively sparse distribution compared with the hidden layer.
  • Figure 5: Design space exploration and hyper-parameter selection. (a) The relationship between the PSNR and the number of layers. It also shows the influence of different MLP architectures, different number of training iterations. (b) The relationship between the PSNR and the activation function frequency under different MLP hidden dimensions. (c) The influence of different learning rates to the PSNR when training for different numbers of iterations using different activation frequencies.
  • ...and 3 more figures