Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and Client

Jun Xia; Yi Zhang; Yiyu Shi

Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and Client

Jun Xia, Yi Zhang, Yiyu Shi

TL;DR

This work addresses energy-aware optimization in Federated Learning for battery-powered AIoT devices under heterogeneity. It introduces DR-FL, which maintains a layer-wise global model on the cloud and assigns subsets of this model to devices via a MARL-based dual-selection framework, enabling adaptive participation and model selection that balance accuracy, runtime, and energy use. The authors integrate QMIX-based cooperative reinforcement learning to jointly optimize layer selection and device participation, and they validate DR-FL on four datasets (CIFAR10/100, SVHN, Fashion-MNIST) with both simulation and real-wireless test-beds, showing improved accuracy and energy efficiency over state-of-the-art heterogeneous FL methods. The results demonstrate DR-FL’s scalability to large AIoT ecosystems and its robustness to non-IID data distributions, underscoring its practical impact for energy-constrained collaborative learning.

Abstract

Although Federated Learning (FL) is promising in knowledge sharing for heterogeneous Artificial Intelligence of Thing (AIoT) devices, their training performance and energy efficacy are severely restricted in practical battery-driven scenarios due to the ``wooden barrel effect'' caused by the mismatch between homogeneous model paradigms and heterogeneous device capability. As a result, due to various kinds of differences among devices, it is hard for existing FL methods to conduct training effectively in energy-constrained scenarios, such as battery constraints of devices. To tackle the above issues, we propose an energy-aware FL framework named DR-FL, which considers the energy constraints in both clients and heterogeneous deep learning models to enable energy-efficient FL. Unlike Vanilla FL, DR-FL adopts our proposed Muti-Agents Reinforcement Learning (MARL)-based dual-selection method, which allows participated devices to make contributions to the global model effectively and adaptively based on their computing capabilities and energy capacities in a MARL-based manner. Experiments conducted with various widely recognized datasets demonstrate that DR-FL has the capability to optimize the exchange of knowledge among diverse models in large-scale AIoT systems while adhering to energy limitations. Additionally, it improves the performance of each individual heterogeneous device's model.

Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and Client

TL;DR

Abstract

Paper Structure (23 sections, 10 equations, 6 figures, 2 tables)

This paper contains 23 sections, 10 equations, 6 figures, 2 tables.

Introduction
Related Work
Preliminaries
Federated Learning
Multi-Agent Reinforcement Learning
Method
Problem Formulation
Workflow of DR-FL
Dual-Selection for Local Model and Client
MARL Training Process:
MARL Agent State Design:
Agent Action Design:
Reward Function Design:
Experimental Results
Experimental Settings
...and 8 more sections

Figures (6)

Figure 1: The energy consumption waste of the "wooden barrel effect" in Vanilla FL is usually due to the following two reasons, i.e., the mismatch between computing power and homogeneous model, and the mismatch between power consumption and homogeneous model. The former uses device energy for waiting time, while the latter uses device energy for useless training time (only enough power to support training but not support communication).
Figure 2: Framework and workflow of our method.
Figure 3: Maximum Q Value Guided Dual-selection. There are two networks here, i.e., the model selection network and the device evaluation network. The model selection network is calculated through the value $O$ observed by the agent from the environment and the action set $A_{t-1}$ of the previous round, thereby obtaining the latest action and its corresponding Q value. The device evaluation network obtains the Q values of all devices and then uses the hybrid network to combine all Q values and the current timestamp state $S_{t}$ through a two-layer weight matrix into an overall Q value $Q_{tot}$. Then, the network uses the discounted rewards given by the environment for MARL, thereby multi-agents can obtain their own rewards from the environment. $h$ means the MLP for extracting deep representations of states or actions. $|\cdot|$ means the dot product.
Figure 4: Real test-bed platform for our experiment.
Figure 5: Comparison of total energy consumption and running time.
...and 1 more figures

Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and Client

TL;DR

Abstract

Towards Energy-Aware Federated Learning via MARL: A Dual-Selection Approach for Model and Client

Authors

TL;DR

Abstract

Table of Contents

Figures (6)