Table of Contents
Fetching ...

Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation

Marcus Rüb, Axel Sikora, Daniel Mueller-Gritschneder

TL;DR

This study introduces EmbeddedTrain, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units, and demonstrates its capacity to efficiently manage computational resources while maintaining high accuracy.

Abstract

This study introduces TinyPropv2, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity, including the ability to selectively skip training steps. This feature significantly lowers computational effort without substantially compromising accuracy. Our comprehensive evaluation across diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR, and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases. For instance, against full training, TinyPropv2's accuracy drop is minimal, for example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms of computational effort, TinyPropv2 shows a marked reduction, requiring as little as 10 percent of the computational effort needed for full training in some scenarios, and consistently outperforms other sparse training methodologies. These findings underscore TinyPropv2's capacity to efficiently manage computational resources while maintaining high accuracy, positioning it as an advantageous solution for advanced embedded device applications in the IoT ecosystem.

Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation

TL;DR

This study introduces EmbeddedTrain, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units, and demonstrates its capacity to efficiently manage computational resources while maintaining high accuracy.

Abstract

This study introduces TinyPropv2, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity, including the ability to selectively skip training steps. This feature significantly lowers computational effort without substantially compromising accuracy. Our comprehensive evaluation across diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR, and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases. For instance, against full training, TinyPropv2's accuracy drop is minimal, for example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms of computational effort, TinyPropv2 shows a marked reduction, requiring as little as 10 percent of the computational effort needed for full training in some scenarios, and consistently outperforms other sparse training methodologies. These findings underscore TinyPropv2's capacity to efficiently manage computational resources while maintaining high accuracy, positioning it as an advantageous solution for advanced embedded device applications in the IoT ecosystem.
Paper Structure (45 sections, 12 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 45 sections, 12 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Operational Workflow of TinyPropv2: The process begins with (1) performing the forward pass to compute the output. This is followed by (2) calculating the loss function and accumulating the local errors. Subsequently, (3) a decision is made on whether to train the datapoint based on the computed error. Next, (4) the optimal number of gradients to update, denoted as 'local k,' is determined from the aggregated error. (5) The algorithm then identifies the top 'k' gradients that will be updated. Finally, (6) these selected gradients undergo the sparse backpropagation process, completing the training step.
  • Figure 2: Comparative analysis of computational effort required for different training methods across various datasets.