FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation against Heterogeneous Annotation Noise

Nannan Wu; Zhaobin Sun; Zengqiang Yan; Li Yu

FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation against Heterogeneous Annotation Noise

Nannan Wu, Zhaobin Sun, Zengqiang Yan, Li Yu

TL;DR

This work addresses the problem of annotation noise in federated medical image segmentation by introducing a two-level Non-IID noise model—pixel-wise contour bias modeled with Contour Evolution Model (CEM) and client-wise heterogeneity across sources. It proposes FedA$^3$I, a quality-aware aggregation framework that estimates client annotation quality via region-level losses, clusters clients with a Gaussian Mixture Model, and blends quality- and quantity-based weights in a layer-wise manner to protect deep-layer semantic learning from noisy labels. The method demonstrates superior Dice performance on ISIC skin and breast ultrasound datasets under heterogeneous noise, outperforming a broad range of SOTA methods without added training overhead. The work advances realistic FMIS by integrating annotation quality into federated aggregation, enabling robust learning from decentralized, imperfect annotations and offering a practical path toward privacy-preserving medical image analysis.

Abstract

Federated learning (FL) has emerged as a promising paradigm for training segmentation models on decentralized medical data, owing to its privacy-preserving property. However, existing research overlooks the prevalent annotation noise encountered in real-world medical datasets, which limits the performance ceilings of FL. In this paper, we, for the first time, identify and tackle this problem. For problem formulation, we propose a contour evolution for modeling non-independent and identically distributed (Non-IID) noise across pixels within each client and then extend it to the case of multi-source data to form a heterogeneous noise model (i.e., Non-IID annotation noise across clients). For robust learning from annotations with such two-level Non-IID noise, we emphasize the importance of data quality in model aggregation, allowing high-quality clients to have a greater impact on FL. To achieve this, we propose Federated learning with Annotation quAlity-aware AggregatIon, named FedA3I, by introducing a quality factor based on client-wise noise estimation. Specifically, noise estimation at each client is accomplished through the Gaussian mixture model and then incorporated into model aggregation in a layer-wise manner to up-weight high-quality clients. Extensive experiments on two real-world medical image segmentation datasets demonstrate the superior performance of FedA$^3$I against the state-of-the-art approaches in dealing with cross-client annotation noise. The code is available at https://github.com/wnn2000/FedAAAI.

FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation against Heterogeneous Annotation Noise

TL;DR

I, a quality-aware aggregation framework that estimates client annotation quality via region-level losses, clusters clients with a Gaussian Mixture Model, and blends quality- and quantity-based weights in a layer-wise manner to protect deep-layer semantic learning from noisy labels. The method demonstrates superior Dice performance on ISIC skin and breast ultrasound datasets under heterogeneous noise, outperforming a broad range of SOTA methods without added training overhead. The work advances realistic FMIS by integrating annotation quality into federated aggregation, enabling robust learning from decentralized, imperfect annotations and offering a practical path toward privacy-preserving medical image analysis.

Abstract

I against the state-of-the-art approaches in dealing with cross-client annotation noise. The code is available at https://github.com/wnn2000/FedAAAI.

Paper Structure (29 sections, 1 theorem, 14 equations, 7 figures, 3 tables)

This paper contains 29 sections, 1 theorem, 14 equations, 7 figures, 3 tables.

Introduction
Related Work
Federated Medical Image Segmentation
Learning with Noisy Labels/Annotations
Methodology
Preliminaries
Noise Model
Contour Evolution Model
Heterogeneous Noise Model with CEMs
Annotation Quality-Aware Aggregation
Motivation and Overview
Noise Estimation
Aggregation Weight Calculation
Experiments
Datasets and Evaluation Metric
...and 14 more sections

Key Result

Theorem 1

Assume there are $l_{sub}$ pairs $(u_1, v_1), (u_2, v_2), \dots, (u_{l_{sub}}, v_{l_{sub}})$, where $u_1, u_2, \dots, u_{l_{sub}}$ are differently selected from $\{1, 2, \dots, l\} (l_{sub} < l)$ with a determined strategy, and $v_1, v_2, \dots, v_{l_{sub}}$$\stackrel{\text{i.i.d.}}{\sim}$$\bm{N}(\m

Figures (7)

Figure 1: Annotation noise in multi-source datasets. The blue and red curves represent the contours of clean (i.e., ideal) and noisy annotations, respectively. Given a certain sample, annotations between two curves are noisy, indicating that noise is distributed near contours instead of being IID among pixels. Given samples from different clients, annotation noise is heterogeneous where noise on client $i$ causes under-segmentation and noise on client $j$ causes over-segmentation.
Figure 2: An example of using CEM (i.e., $C(-8, 2)$) to generate annotation noise. In (a), the arrow indicates the positive annotation direction starting from the origin (i.e., 0) of the contour (i.e., the blue curve). Then, the bias of each pixel is calculated based on the polynomial function fitted by sampled pixels as illustrated in (c), and used to control the movements of pixels of the contour as indicated by the green arrow, resulting in the noisy contour (i.e., the red curve) in (b).
Figure 3: Overview of FedA$^3$I. In each client $i$, we compute the learning difficulty of the inner and outer regions (the white and gray regions around contours respectively), denoted as $q_{i,1}$ and $q_{i,2}$, and upload them to the server. These indicators are used to fit a Gaussian mixture model (GMM), which divides all clients into two subsets. Based on this, we compute the quality-based weights using two components, namely IntraGW and InterGW. Finally, both quality-based and quantity-based weights are utilized for model aggregation in a layer-wise manner.
Figure 4: Ablation studies on the balance coefficient $r$ in InterGW. The solid line and transparent areas represent the mean and standard deviation respectively. Second best means the performance of the second-best method in Tab. \ref{['tab:sota']}.
Figure 5: Ablation studies on the warm-up round $T_1$. The solid line and transparent areas represent the mean and standard deviation respectively. Second best means the performance of the second-best method in Tab. \ref{['tab:sota']}.
...and 2 more figures

Theorems & Definitions (3)

Definition 1: PDN
Theorem 1
proof : Proof

FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation against Heterogeneous Annotation Noise

TL;DR

Abstract

FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation against Heterogeneous Annotation Noise

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (3)