Table of Contents
Fetching ...

Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge

Senkang Hu, Yanan Ma, Yihang Tao, Zhengru Fang, Zihan Fang, Yiqin Deng, Sam Kwong, Yuguang Fang

TL;DR

This paper addresses the challenge of adapting large pre-trained models on edge devices under tight compute and memory constraints. It introduces TaskEdge, a task-aware parameter-efficient fine-tuning framework that ranks parameter importance by combining weight magnitude and input activations, and allocates trainable weights in a model-agnostic, neuron-level fashion to distribute updates across the network. TaskEdge achieves comparable or better performance than full fine-tuning methods while updating less than 0.1% of parameters, and it integrates smoothly with LoRA and structured sparsity for additional acceleration. The approach demonstrates strong results on VTAB-1k across diverse tasks, underscoring its practicality for privacy-preserving, real-time edge adaptation of large models.

Abstract

Large language models (LLMs) have achieved remarkable success in various tasks, such as decision-making, reasoning, and question answering. They have been widely used in edge devices. However, fine-tuning LLMs to specific tasks at the edge is challenging due to the high computational cost and the limited storage and energy resources at the edge. To address this issue, we propose TaskEdge, a task-aware parameter-efficient fine-tuning framework at the edge, which allocates the most effective parameters to the target task and only updates the task-specific parameters. Specifically, we first design a parameter importance calculation criterion that incorporates both weights and input activations into the computation of weight importance. Then, we propose a model-agnostic task-specific parameter allocation algorithm to ensure that task-specific parameters are distributed evenly across the model, rather than being concentrated in specific regions. In doing so, TaskEdge can significantly reduce the computational cost and memory usage while maintaining performance on the target downstream tasks by updating less than 0.1\% of the parameters. In addition, TaskEdge can be easily integrated with structured sparsity to enable acceleration by NVIDIA's specialized sparse tensor cores, and it can be seamlessly integrated with LoRA to enable efficient sparse low-rank adaptation. Extensive experiments on various tasks demonstrate the effectiveness of TaskEdge.

Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge

TL;DR

This paper addresses the challenge of adapting large pre-trained models on edge devices under tight compute and memory constraints. It introduces TaskEdge, a task-aware parameter-efficient fine-tuning framework that ranks parameter importance by combining weight magnitude and input activations, and allocates trainable weights in a model-agnostic, neuron-level fashion to distribute updates across the network. TaskEdge achieves comparable or better performance than full fine-tuning methods while updating less than 0.1% of parameters, and it integrates smoothly with LoRA and structured sparsity for additional acceleration. The approach demonstrates strong results on VTAB-1k across diverse tasks, underscoring its practicality for privacy-preserving, real-time edge adaptation of large models.

Abstract

Large language models (LLMs) have achieved remarkable success in various tasks, such as decision-making, reasoning, and question answering. They have been widely used in edge devices. However, fine-tuning LLMs to specific tasks at the edge is challenging due to the high computational cost and the limited storage and energy resources at the edge. To address this issue, we propose TaskEdge, a task-aware parameter-efficient fine-tuning framework at the edge, which allocates the most effective parameters to the target task and only updates the task-specific parameters. Specifically, we first design a parameter importance calculation criterion that incorporates both weights and input activations into the computation of weight importance. Then, we propose a model-agnostic task-specific parameter allocation algorithm to ensure that task-specific parameters are distributed evenly across the model, rather than being concentrated in specific regions. In doing so, TaskEdge can significantly reduce the computational cost and memory usage while maintaining performance on the target downstream tasks by updating less than 0.1\% of the parameters. In addition, TaskEdge can be easily integrated with structured sparsity to enable acceleration by NVIDIA's specialized sparse tensor cores, and it can be seamlessly integrated with LoRA to enable efficient sparse low-rank adaptation. Extensive experiments on various tasks demonstrate the effectiveness of TaskEdge.

Paper Structure

This paper contains 17 sections, 6 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Epochs vs Accuracy
  • Figure 2: Trainable Parameters vs Accuracy