Table of Contents
Fetching ...

RELIEF: Turning Missing Modalities into Training Acceleration for Federated Learning on Heterogeneous IoT Edge

Beining Wu, Zihao Ding, Jun Huang

Abstract

Federated learning (FL) over heterogeneous IoT edge devices faces coupled system-modality-data heterogeneity: the lower-cost device carries both fewer sensors and less computational power, so the slowest device (straggler) produces the most incomplete gradient signals. Naively averaging their updates dilutes rare-modality information and wastes computation on absent-sensor parameters, whereas existing methods handle the triple heterogeneity (system, modality, data) in isolation and none addresses their coupling. To resolve this issue, we propose RELIEF, a framework that partitions the fusion-layer Low-Rank Adaptation (LoRA) projection matrix into modality-aligned column blocks and uses this partition as a unified interface for aggregation, elastic training, and communication. Each block is aggregated only within the cohort of devices possessing that modality, which eliminates cross-modal gradient interference; the server then allocates personalized training budgets by prioritizing blocks with the highest cohort-internal divergence, so that resource-constrained devices train fewer but more impactful parameters. We prove that cohort-wise aggregation removes interference from the convergence bound and that the divergence-guided allocation achieves sublinear regret. Experiments on two IoT sensor datasets (PAMAP2, MHEALTH) under both full-parameter (CNN) and parameter-efficient (LoRA) training show that RELIEF achieves up to 9.41x speedup and 37% energy reduction over FedAvg with up to 15.3 pp rare-modality F1 gains, and real-device validation on a two-Jetson AGX Orin testbed confirms these results.

RELIEF: Turning Missing Modalities into Training Acceleration for Federated Learning on Heterogeneous IoT Edge

Abstract

Federated learning (FL) over heterogeneous IoT edge devices faces coupled system-modality-data heterogeneity: the lower-cost device carries both fewer sensors and less computational power, so the slowest device (straggler) produces the most incomplete gradient signals. Naively averaging their updates dilutes rare-modality information and wastes computation on absent-sensor parameters, whereas existing methods handle the triple heterogeneity (system, modality, data) in isolation and none addresses their coupling. To resolve this issue, we propose RELIEF, a framework that partitions the fusion-layer Low-Rank Adaptation (LoRA) projection matrix into modality-aligned column blocks and uses this partition as a unified interface for aggregation, elastic training, and communication. Each block is aggregated only within the cohort of devices possessing that modality, which eliminates cross-modal gradient interference; the server then allocates personalized training budgets by prioritizing blocks with the highest cohort-internal divergence, so that resource-constrained devices train fewer but more impactful parameters. We prove that cohort-wise aggregation removes interference from the convergence bound and that the divergence-guided allocation achieves sublinear regret. Experiments on two IoT sensor datasets (PAMAP2, MHEALTH) under both full-parameter (CNN) and parameter-efficient (LoRA) training show that RELIEF achieves up to 9.41x speedup and 37% energy reduction over FedAvg with up to 15.3 pp rare-modality F1 gains, and real-device validation on a two-Jetson AGX Orin testbed confirms these results.

Paper Structure

This paper contains 37 sections, 5 theorems, 21 equations, 8 figures, 6 tables, 1 algorithm.

Key Result

Lemma 1

Consider the fusion-layer block $A_m$ aggregated via FedAvg: $\hat{g}_m = \frac{1}{N}\sum_{n=1}^N \Delta A_{m,n}^r$. The expected squared error relative to the true cohort-mean gradient $\bar{g}_m = \frac{1}{|\mathcal{C}_m|}\sum_{n \in \mathcal{C}_m} \nabla_{A_m} F_n$ decomposes as: where $\hat{\epsilon}_m = \frac{1}{N-|\mathcal{C}_m|}\sum_{n \notin \mathcal{C}_m}\Delta A_{m,n}^r$. The bias$^2$ t

Figures (8)

  • Figure 1: Problem illustration. (Q1) FedAvg dilutes rare-modality signals by mixing incompatible gradient updates. (Q2) Single-modal stragglers waste computation on absent-modality parameters. (Q3) Multimodal FL and elastic training define parameter groups along conflicting axes. RELIEF uses modality-aligned column blocks as a unified interface for all three.
  • Figure 2: Pairwise cosine similarity of fusion-layer LoRA updates across device pairs, grouped by modality column block. Interference extends beyond missing-modality blocks to shared blocks.
  • Figure 3: Update divergence of each modality column block across training phases. Rare-modality blocks (Mag, HR) exhibit amplifying divergence rather than convergence.
  • Figure 4: Overview of RELIEF. Top: Modality-aligned column-block decomposition of the fusion-layer LoRA matrix and cohort-scoped update aggregation across rounds. Bottom: (a) Multimodal FL with heterogeneous devices, (b) divergence-guided elastic allocation, and (c) cohort-wise server aggregation.
  • Figure 5: Convergence comparison (macro-F1 vs. communication round) for five representative methods across two datasets and two backbones.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Lemma 1: FedAvg Aggregation Error Decomposition
  • proof
  • Theorem 2: Cohort-Wise Aggregation Error
  • proof
  • Theorem 3: Convergence of RELIEF
  • proof
  • Proposition 4: Optimality of Divergence-Guided Allocation
  • proof
  • Proposition 5: Regret of EMA-Based Divergence Tracking
  • proof