Table of Contents
Fetching ...

Skip2-LoRA: A Lightweight On-device DNN Fine-tuning Method for Low-cost Edge Devices

Hiroki Matsutani, Masaaki Kondo, Kazuki Sunaga, Radu Marculescu

TL;DR

The results show that Skip2-LoRA reduces the fine-tuning time by 90.0% on average compared to the counterpart that has the same number of trainable parameters while preserving the accuracy, while taking only a few seconds on the microcontroller board.

Abstract

This paper proposes Skip2-LoRA as a lightweight fine-tuning method for deep neural networks to address the gap between pre-trained and deployed models. In our approach, trainable LoRA (low-rank adaptation) adapters are inserted between the last layer and every other layer to enhance the network expressive power while keeping the backward computation cost low. This architecture is well-suited to cache intermediate computation results of the forward pass and then can skip the forward computation of seen samples as training epochs progress. We implemented the combination of the proposed architecture and cache, denoted as Skip2-LoRA, and tested it on a $15 single board computer. Our results show that Skip2-LoRA reduces the fine-tuning time by 90.0% on average compared to the counterpart that has the same number of trainable parameters while preserving the accuracy, while taking only a few seconds on the microcontroller board.

Skip2-LoRA: A Lightweight On-device DNN Fine-tuning Method for Low-cost Edge Devices

TL;DR

The results show that Skip2-LoRA reduces the fine-tuning time by 90.0% on average compared to the counterpart that has the same number of trainable parameters while preserving the accuracy, while taking only a few seconds on the microcontroller board.

Abstract

This paper proposes Skip2-LoRA as a lightweight fine-tuning method for deep neural networks to address the gap between pre-trained and deployed models. In our approach, trainable LoRA (low-rank adaptation) adapters are inserted between the last layer and every other layer to enhance the network expressive power while keeping the backward computation cost low. This architecture is well-suited to cache intermediate computation results of the forward pass and then can skip the forward computation of seen samples as training epochs progress. We implemented the combination of the proposed architecture and cache, denoted as Skip2-LoRA, and tested it on a $15 single board computer. Our results show that Skip2-LoRA reduces the fine-tuning time by 90.0% on average compared to the counterpart that has the same number of trainable parameters while preserving the accuracy, while taking only a few seconds on the microcontroller board.

Paper Structure

This paper contains 14 sections, 7 equations, 4 figures, 7 tables, 2 algorithms.

Figures (4)

  • Figure 1: Fine-tuning methods of DNNs consisting of $n$ FC layers, where $n=3$. $\bm{W^k}$ and $\bm{b^k}$ denote weights and biases for $k$-th layer. In LoRA-All and LoRA-Last, $\bm{W^{k-1,k}}$ denotes weights for $k$-th LoRA adapter, where rank $R=1$. Parameters to be updated are colored in red.
  • Figure 2: Evaluation environment consisting of Raspberry Pi Zero 2 W.
  • Figure 3: Training curves of Skip2-LoRA on three datasets. Required epochs are 100, 60, and 200 in Damage1, Damage2, and HAR datasets.
  • Figure 4: Power consumption and temperature of Skip2-LoRA with HAR dataset. Fine-tuning starts at 9sec.