Table of Contents
Fetching ...

AdaBet: Gradient-free Layer Selection for Efficient Training of Deep Neural Networks

Irene Tenison, Soumyajit Chatterjee, Fahim Kawsar, Mohammad Malekzadeh

TL;DR

AdaBet addresses the challenge of on-device retraining by eliminating backpropagation and server-side meta-training, using topological analysis of activations to select layers. It uses the first Betti Number $b_1$ of layer activations, normalized as $\hat{b}_1^i=b_1^i/|a^i|$, to estimate learning capacity and selects a fraction $\rho$ of layers for retraining, avoiding labels and backpropagation. Evaluations across 16 dataset–model pairs show AdaBet achieves around +5% average accuracy gain over gradient-based baselines while reducing peak memory by about 40%, with substantial per-epoch speedups. The approach offers a privacy-preserving, hardware-friendly route for efficient on-device adaptation.

Abstract

To utilize pre-trained neural networks on edge and mobile devices, we often require efficient adaptation to user-specific runtime data distributions while operating under limited compute and memory resources. On-device retraining with a target dataset can facilitate such adaptations; however, it remains impractical due to the increasing depth of modern neural nets, as well as the computational overhead associated with gradient-based optimization across all layers. Current approaches reduce training cost by selecting a subset of layers for retraining, however, they rely on labeled data, at least one full-model backpropagation, or server-side meta-training; limiting their suitability for constrained devices. We introduce AdaBet, a gradient-free layer selection approach to rank important layers by analyzing topological features of their activation spaces through Betti Numbers and using forward passes alone. AdaBet allows selecting layers with high learning capacity, which are important for retraining and adaptation, without requiring labels or gradients. Evaluating AdaBet on sixteen pairs of benchmark models and datasets, shows AdaBet achieves an average gain of 5% more classification accuracy over gradient-based baselines while reducing average peak memory consumption by 40%.

AdaBet: Gradient-free Layer Selection for Efficient Training of Deep Neural Networks

TL;DR

AdaBet addresses the challenge of on-device retraining by eliminating backpropagation and server-side meta-training, using topological analysis of activations to select layers. It uses the first Betti Number of layer activations, normalized as , to estimate learning capacity and selects a fraction of layers for retraining, avoiding labels and backpropagation. Evaluations across 16 dataset–model pairs show AdaBet achieves around +5% average accuracy gain over gradient-based baselines while reducing peak memory by about 40%, with substantial per-epoch speedups. The approach offers a privacy-preserving, hardware-friendly route for efficient on-device adaptation.

Abstract

To utilize pre-trained neural networks on edge and mobile devices, we often require efficient adaptation to user-specific runtime data distributions while operating under limited compute and memory resources. On-device retraining with a target dataset can facilitate such adaptations; however, it remains impractical due to the increasing depth of modern neural nets, as well as the computational overhead associated with gradient-based optimization across all layers. Current approaches reduce training cost by selecting a subset of layers for retraining, however, they rely on labeled data, at least one full-model backpropagation, or server-side meta-training; limiting their suitability for constrained devices. We introduce AdaBet, a gradient-free layer selection approach to rank important layers by analyzing topological features of their activation spaces through Betti Numbers and using forward passes alone. AdaBet allows selecting layers with high learning capacity, which are important for retraining and adaptation, without requiring labels or gradients. Evaluating AdaBet on sixteen pairs of benchmark models and datasets, shows AdaBet achieves an average gain of 5% more classification accuracy over gradient-based baselines while reducing average peak memory consumption by 40%.

Paper Structure

This paper contains 30 sections, 5 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Peak memory efficiency.AdaBet reduces peak memory consumption of retraining by up to 76% (on average by 40% for different pre-trained DNNs retrained on Oxford-IIIT Pets), compared to peak memory of inference-only and full-layer training as the baseline.
  • Figure 2: Fisher Information vs. Betti Numbers. Pre-trained VGG16 adapted to Stanford Dogs dataset. Green/dark rectangles show selected layers. (a) FI ranking depends on the number of backpropagations over the entire model. (b) FI ranking depends on the batch of data selected (due to changes in the random seed), with considerable changes in accuracy in all the cases. (c) Our gradient-free Betti Number ranking of AdaBet, is more consistent across different batches of data used to select the layers with negligible changes to the overall accuracy.
  • Figure 3: Layer-wise Betti Numbers computed on activations of a pre-trained VGG-16 model for Oxford-IIIT Pets, along with the number of clusters obtained via DBSCAN on UMAP reduced layer embeddings; both averaged across 5 independent runs. DBSCAN of the embeddings shows a similar ranking pattern, but Betti Numbers offer a more granular ranking of the layers.
  • Figure 4: An overview of AdaBet. First, we estimate each layer's learning capacity via Betti Numbers. Second, we select the most important layers based on available compute budget. Third, the re-training of selected layers in $\mathcal{M}$ using the local dataset $\mathbb{D}$.
  • Figure 5: Detailed pipeline of AdaBet during its selection phase. Here $a^i$ denotes the activations, $b_{1}^i$ denotes the Betti Number $b_1$ from the activations, $\hat{b}_1^i$ is the normalized Betti Number and $p^i$ is the parameters (weights and biases) of the $i$th layer, respectively.
  • ...and 7 more figures