Table of Contents
Fetching ...

Activation Map Compression through Tensor Decomposition for Deep Learning

Le-Trung Nguyen, Aël Quélennec, Enzo Tartaglione, Samuel Tardieu, Van-Tam Nguyen

TL;DR

Experimental results obtained on main-stream architectures and tasks demonstrate Pareto-superiority over other state-of-the-art solutions, in terms of the trade-off between generalization and memory footprint.

Abstract

Internet of Things and Deep Learning are synergetically and exponentially growing industrial fields with a massive call for their unification into a common framework called Edge AI. While on-device inference is a well-explored topic in recent research, backpropagation remains an open challenge due to its prohibitive computational and memory costs compared to the extreme resource constraints of embedded devices. Drawing on tensor decomposition research, we tackle the main bottleneck of backpropagation, namely the memory footprint of activation map storage. We investigate and compare the effects of activation compression using Singular Value Decomposition and its tensor variant, High-Order Singular Value Decomposition. The application of low-order decomposition results in considerable memory savings while preserving the features essential for learning, and also offers theoretical guarantees to convergence. Experimental results obtained on main-stream architectures and tasks demonstrate Pareto-superiority over other state-of-the-art solutions, in terms of the trade-off between generalization and memory footprint.

Activation Map Compression through Tensor Decomposition for Deep Learning

TL;DR

Experimental results obtained on main-stream architectures and tasks demonstrate Pareto-superiority over other state-of-the-art solutions, in terms of the trade-off between generalization and memory footprint.

Abstract

Internet of Things and Deep Learning are synergetically and exponentially growing industrial fields with a massive call for their unification into a common framework called Edge AI. While on-device inference is a well-explored topic in recent research, backpropagation remains an open challenge due to its prohibitive computational and memory costs compared to the extreme resource constraints of embedded devices. Drawing on tensor decomposition research, we tackle the main bottleneck of backpropagation, namely the memory footprint of activation map storage. We investigate and compare the effects of activation compression using Singular Value Decomposition and its tensor variant, High-Order Singular Value Decomposition. The application of low-order decomposition results in considerable memory savings while preserving the features essential for learning, and also offers theoretical guarantees to convergence. Experimental results obtained on main-stream architectures and tasks demonstrate Pareto-superiority over other state-of-the-art solutions, in terms of the trade-off between generalization and memory footprint.

Paper Structure

This paper contains 24 sections, 25 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: We compress the activations that will be later employed for backpropagation.
  • Figure 2: For a single convolutional layer with minibatch size $B$, (a) and (b) illustrate the predicted changes in compression rate $R_C$ and speedup ratios $R_S$ as functions of $K_j$, when comparing HOSVD with vanilla training, respectively. (c) shows the evolution of the SNR with retained variance $\varepsilon$.
  • Figure 3: Explained variance $\varepsilon$ for the first two dimensions of the activation map in the $4^{th}$ last layer when fine-tuning the last four layers of MCUNet using HOSVD on CIFAR-10, following setup A.
  • Figure 4: Behavior of top1 validation accuracy and peak memory when applying HOSVD with different explained variance thresholds $\varepsilon$ when finetuning the last four convolutional layers of an MCUNet model using the CIFAR-10 dataset on setup A.
  • Figure 5: Performance curves of an MCUNet pre-trained on ImageNet and finetuned on CIFAR-10 with different activation compression strategies.
  • ...and 4 more figures