Table of Contents
Fetching ...

CAFL-L: Constraint-Aware Federated Learning with Lagrangian Dual Optimization for On-Device Language Models

Dongqi Zheng, Wenjin Fu

TL;DR

CAFL-L addresses the challenge of training language models with federated learning on resource-constrained edge devices by introducing a Lagrangian dual optimization framework that enforces multi-resource budgets (energy, bandwidth, memory, and temperature). The method dynamically adapts training hyperparameters via a dual-variable policy and preserves training stability with a token-budget mechanism, ensuring stable progress under budget constraints. Empirical results on a Tiny Shakespeare character-level model show substantial gains in constraint satisfaction (roughly 70% energy reduction, 95% communication savings, and 23% memory reduction) with only a modest increase in validation loss, demonstrating practical viability for edge deployment. This work advances federated learning toward real-world, resource-aware on-device language modeling by jointly handling multiple device budgets without severely sacrificing accuracy.

Abstract

We introduce Constraint-Aware Federated Learning with Lagrangian Dual Optimization (CAFL-L), a principled extension of FedAvg that explicitly incorporates device-level resource constraints including energy, communication, memory, and thermal budgets. CAFL-L employs Lagrangian dual optimization to dynamically adapt training hyperparameters -- freezing depth, local steps, batch size, and communication compression -- while preserving training stability through token-budget preservation via gradient accumulation. Experiments on a character-level language model demonstrate that CAFL-L achieves superior constraint satisfaction compared to standard FedAvg (reducing memory usage by 20% and communication by 95%) while maintaining competitive validation performance, making it practical for deployment on resource-constrained edge devices.

CAFL-L: Constraint-Aware Federated Learning with Lagrangian Dual Optimization for On-Device Language Models

TL;DR

CAFL-L addresses the challenge of training language models with federated learning on resource-constrained edge devices by introducing a Lagrangian dual optimization framework that enforces multi-resource budgets (energy, bandwidth, memory, and temperature). The method dynamically adapts training hyperparameters via a dual-variable policy and preserves training stability with a token-budget mechanism, ensuring stable progress under budget constraints. Empirical results on a Tiny Shakespeare character-level model show substantial gains in constraint satisfaction (roughly 70% energy reduction, 95% communication savings, and 23% memory reduction) with only a modest increase in validation loss, demonstrating practical viability for edge deployment. This work advances federated learning toward real-world, resource-aware on-device language modeling by jointly handling multiple device budgets without severely sacrificing accuracy.

Abstract

We introduce Constraint-Aware Federated Learning with Lagrangian Dual Optimization (CAFL-L), a principled extension of FedAvg that explicitly incorporates device-level resource constraints including energy, communication, memory, and thermal budgets. CAFL-L employs Lagrangian dual optimization to dynamically adapt training hyperparameters -- freezing depth, local steps, batch size, and communication compression -- while preserving training stability through token-budget preservation via gradient accumulation. Experiments on a character-level language model demonstrate that CAFL-L achieves superior constraint satisfaction compared to standard FedAvg (reducing memory usage by 20% and communication by 95%) while maintaining competitive validation performance, making it practical for deployment on resource-constrained edge devices.

Paper Structure

This paper contains 14 sections, 6 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: CAFL-L framework overview. The server maintains global model and dual variables. Policy $\pi(\lambda)$ adapts training knobs $(k,s,b,q)$ based on constraint violations. Clients perform local training and report resource usage, which feeds back into dual updates.
  • Figure 2: Resource-constraint satisfaction. CAFL-L adaptively manages memory and communication within budgets; while FedAvg keeps violateing them.
  • Figure 3: Energy and temperature control. CAFL-L prevents energy/thermal runaway by moderating computational intensity and staying near budget.
  • Figure 4: CAFL-L shows good convergence (2.10 vs. 1.93), achieving competitive validation loss compared to FedAvg.