Multi-Level Additive Modeling for Structured Non-IID Federated Learning

Shutong Chen; Tianyi Zhou; Guodong Long; Jie Ma; Jing Jiang; Chengqi Zhang

Multi-Level Additive Modeling for Structured Non-IID Federated Learning

Shutong Chen, Tianyi Zhou, Guodong Long, Jie Ma, Jing Jiang, Chengqi Zhang

TL;DR

This work tackles the challenge of structured non-IID distributions in Federated Learning by proposing Multi-Level Additive Modeling (MAM) and the FeMAM training algorithm. FeMAM builds a hierarchical knowledge-sharing architecture with a global top level, cluster-based mid levels, and a personalized bottom level, where each client contributes additively across levels: $f_i(x) = \sum_{l=1}^{L} f(X_i; \Theta^{(l)}_{C_l(i)})$. It introduces level-wise optimization with progressive model addition, adaptive clustering via EM/K-means, and pruning to discover a client-specific sharing structure, accompanied by convergence guarantees for both the clustering and overall FeMAM objectives. Empirically, FeMAM outperforms clustered FL and personalized FL baselines across cluster-wise, client-wise, and multi-level non-IID settings on Tiny ImageNet and CIFAR-100, while maintaining reasonable communication and computation overhead. Overall, the approach provides a flexible, scalable framework for capturing fine-grained knowledge sharing in FL and adapting to diverse non-IID patterns in real-world deployments.

Abstract

The primary challenge in Federated Learning (FL) is to model non-IID distributions across clients, whose fine-grained structure is important to improve knowledge sharing. For example, some knowledge is globally shared across all clients, some is only transferable within a subgroup of clients, and some are client-specific. To capture and exploit this structure, we train models organized in a multi-level structure, called ``Multi-level Additive Models (MAM)'', for better knowledge-sharing across heterogeneous clients and their personalization. In federated MAM (FeMAM), each client is assigned to at most one model per level and its personalized prediction sums up the outputs of models assigned to it across all levels. For the top level, FeMAM trains one global model shared by all clients as FedAvg. For every mid-level, it learns multiple models each assigned to a subgroup of clients, as clustered FL. Every bottom-level model is trained for one client only. In the training objective, each model aims to minimize the residual of the additive predictions by the other models assigned to each client. To approximate the arbitrary structure of non-IID across clients, FeMAM introduces more flexibility and adaptivity to FL by incrementally adding new models to the prediction of each client and reassigning another if necessary, automatically optimizing the knowledge-sharing structure. Extensive experiments show that FeMAM surpasses existing clustered FL and personalized FL methods in various non-IID settings. Our code is available at https://github.com/shutong043/FeMAM.

Multi-Level Additive Modeling for Structured Non-IID Federated Learning

TL;DR

. It introduces level-wise optimization with progressive model addition, adaptive clustering via EM/K-means, and pruning to discover a client-specific sharing structure, accompanied by convergence guarantees for both the clustering and overall FeMAM objectives. Empirically, FeMAM outperforms clustered FL and personalized FL baselines across cluster-wise, client-wise, and multi-level non-IID settings on Tiny ImageNet and CIFAR-100, while maintaining reasonable communication and computation overhead. Overall, the approach provides a flexible, scalable framework for capturing fine-grained knowledge sharing in FL and adapting to diverse non-IID patterns in real-world deployments.

Abstract

Paper Structure (31 sections, 4 theorems, 39 equations, 10 figures, 6 tables, 2 algorithms)

This paper contains 31 sections, 4 theorems, 39 equations, 10 figures, 6 tables, 2 algorithms.

Introduction
Related Works
Methodology
Problem Formulation
Structure of Multi-Level Additive Modeling
Level-wise Optimization
Practical Implementations of FeMAM
Convergence analysis
Convergence of Clustering Objective
Convergence of FeMAM
Experiments
Experimental Setup
Numerical Results
Convergence Analysis
Analysis of Multi-level Structure
...and 16 more sections

Key Result

Theorem 1

(Convergence of clustering objective on cluster levels). Assume for each level $l$ at client $i$, the expectation of the stochastic gradient is unbiased, and the L2 norm is bounded by a constant $U$. Let $Q$ be the local training epochs. Then for arbitrary communication round $t$, $\mathcal{F}$ conv

Figures (10)

Figure 1: Four types of Non-IID. (a) Slight Non-IID that can be solved by FedAvg, (b) Cluster-wise Non-IID that can be solved by Clustered FL, clients are clustered and share knowledge within each cluster via a cluster-wise model, (c) Clien-wise Non-IID that can be solved by Personalized FL, all clients share knowledge via one global model while each client creates its personalized model, and (d) Multi-level Non-IID for fine-grained structure on knowledge sharing in FL system. It can be applied to approximate arbitrary structures of Non-IID including (a), (b), (c) and others.
Figure 2: The overall framework of Federated Learning with Multi-Level Additive Modeling (FeMAM). On the server side, the server maintains a multi-level structure of shareable models. Levels of models are increased progressively from level $1$ to level $L$. At each stage $l$, only the last level of models is trainable and transmitted between server and clients, while all the previous $l-1$ levels of models are fixed. Each level is optimized until convergence before progressing to the next level. On the client side, each client keeps the previous $l-1$ models, and receives one model from the latest level $l$. An additive modeling schema adds the $l$ levels' outputs as the final prediction. The latest local model is updated to minimize the client’s local loss.
Figure 3: Convergence on three non-IID and IID partitions of CIFAR-100 dataset. FeMAM keeps adding a new level of models progressively if the current additive model's accuracy saturates, leading to the staircase convergence patterns. Each blue vertical line indicates the round when a new level is added to FeMAM during training.
Figure 4: FeMAM's multi-level model structures, ground truth (pre-defined) relation structures and convergence curves on Tiny ImageNet dataset, Cluster-wise, $(50,5)$ (Non-IID). Each FeMAM structure is composed of 5 levels (5 columns) and 50 clients. Cluster boundaries are marked if the adjacent clients are from different clusters. The ground truth relation structure on the left matches the FeMAM structure on the right, which means FeMAM discovers inherent data distributions by additive modeling and pruning useless models.
Figure 5: Analysis of runtime and communication efficiency on CIFAR-100 dataset, Client-wise, 0.1 (Non-IID).
...and 5 more figures

Theorems & Definitions (4)

Theorem 1
Theorem 2
Theorem D.1
Theorem D.2

Multi-Level Additive Modeling for Structured Non-IID Federated Learning

TL;DR

Abstract

Multi-Level Additive Modeling for Structured Non-IID Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (4)