Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

Zhonghao Lyu; Yuchen Li; Guangxu Zhu; Jie Xu; H. Vincent Poor; Shuguang Cui

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

Zhonghao Lyu, Yuchen Li, Guangxu Zhu, Jie Xu, H. Vincent Poor, Shuguang Cui

TL;DR

The paper addresses resource management for a two-stage edge-learning pipeline that first performs centralized pre-training on general data and then conducts task-specific fine-tuning via federated edge learning. It derives a convergence bound in terms of the average squared gradient norm that depends on both stages, data distribution shift measured by the Wasserstein distance, and batch/round choices. To optimize performance under energy and delay constraints, the authors formulate a non-convex problem (P1) and solve it with a two-step approach: (i) use successive convex approximation to optimize continuous resource variables for fixed learning rounds, and (ii) perform a two-dimensional search over the integer variables $M$ and $N$ to find a near-optimal configuration. Numerical results on MNIST and CIFAR-derived tasks show that the proposed joint design yields faster convergence and better accuracy under given energy/delay budgets, effectively exploiting the trade-off between pre-training and fine-tuning when data distributions differ between stages.

Abstract

In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning unifying pre-training and task-specific fine-tuning. This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system. In this system, model pre-training is first conducted at an edge server via centralized learning on local pre-stored general data, and then task-specific fine-tuning is performed at edge devices based on the pre-trained model via federated edge learning. For the two-stage learning model, we first analyze the convergence behavior (in terms of the average squared gradient norm bound), which characterizes the impacts of various system parameters such as the number of learning rounds and batch sizes in the two stages on the convergence rate. Based on our analytical results, we then propose a joint communication and computation resource management design to minimize an average squared gradient norm bound, subject to constraints on the transmit power, overall system energy consumption, and training delay. The decision variables include the number of learning rounds, batch sizes, clock frequencies, and transmit power control for both pre-training and fine-tuning stages. Finally, numerical results are provided to evaluate the effectiveness of our proposed design. It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off among the training accuracy, delay, and energy consumption. The proposed design is also shown to effectively leverage the inherent trade-off between pre-training and fine-tuning, which arises from the differences in data distribution between pre-stored general data versus real-time task-specific data, thus efficiently optimizing overall system performance.

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

TL;DR

and

to find a near-optimal configuration. Numerical results on MNIST and CIFAR-derived tasks show that the proposed joint design yields faster convergence and better accuracy under given energy/delay budgets, effectively exploiting the trade-off between pre-training and fine-tuning when data distributions differ between stages.

Abstract

Paper Structure (19 sections, 4 theorems, 45 equations, 6 figures, 1 algorithm)

This paper contains 19 sections, 4 theorems, 45 equations, 6 figures, 1 algorithm.

Introduction
System Model
Pre-training via Centralized Learning
Task-specific Fine-tuning via FEEL
Communication Model
Training Delay and Energy Consumption Analysis
Training Delay Analysis
Energy Consumption Analysis
Convergence Analysis
Assumptions and Definitions on Learning Models
Convergence Analysis
Joint Communication and Computation Resource Management for Two-Stage Edge Learning
Problem Formulation
Proposed Solution to Problem (P1)
Numerical Results
...and 4 more sections

Key Result

Lemma 3.1

After $N$ rounds of task-specific fine-tuning on the initial model $\hat{\hbox{\boldmath{$w$}}}^{(0)}$, the total expected improvement of loss during model fine-tuning is

Figures (6)

Figure 1: Illustration of the considered two-stage edge learning system with model pre-training and task-specific fine-tuning.
Figure 2: Convergence behavior in terms of the training loss over the overall training time. (a) Pre-training and fine-tuning on MNIST. (b) Pre-training on CINIC-10 and fine-tuning on CIFAR-10.
Figure 3: Convergence behavior in terms of the training loss w.r.t. the accumulated overall system energy consumption over time. (a) Pre-training and fine-tuning on MNIST. (b) Pre-training on CINIC-10 and fine-tuning on CIFAR-10.
Figure 4: Classification accuracy w.r.t. different system delay thresholds. (a) Pre-training and fine-tuning on MNIST under the energy consumption threshold $\tilde{E}_0 =250~ {\rm J}$. (b) Pre-training on CINIC-10 and fine-tuning on CIFAR-10 under the energy consumption threshold $\tilde{E}_0 =1700~ {\rm J}$.
Figure 5: Classification accuracy w.r.t. different system energy consumption thresholds. (a) Pre-training and fine-tuning on MNIST under the training delay threshold $\tilde{\tau}_0 =1000~ {\rm s}$. (b) Pre-training on CINIC-10 and fine-tuning on CIFAR-10 under the training delay threshold $\tilde{\tau}_0 =3000~ {\rm s}$.
...and 1 more figures

Theorems & Definitions (6)

Remark 3.1
Lemma 3.1
Lemma 3.2
Theorem 3.1
Remark 3.2
Lemma 1

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

TL;DR

Abstract

Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (6)