Table of Contents
Fetching ...

Federated Unsupervised Semantic Segmentation

Evangelos Charalampakis, Vasileios Mygdalis, Ioannis Pitas

TL;DR

FUSS (Federated Unsupervised image Semantic Segmentation) is proposed, to the authors' knowledge, the first framework to enable fully decentralized, label-free semantic segmentation training and introduces novel federation strategies that promote global consistency in feature and prototype space.

Abstract

This work explores the application of Federated Learning (FL) to Unsupervised Semantic image Segmentation (USS). Recent USS methods extract pixel-level features using frozen visual foundation models and refine them through self-supervised objectives that encourage semantic grouping. These features are then grouped to semantic clusters to produce segmentation masks. Extending these ideas to federated settings requires feature representation and cluster centroid alignment across distributed clients, an inherently difficult task under heterogeneous data distributions in the absence of supervision. To address this, we propose FUSS (Federated Unsupervised image Semantic Segmentation) which is, to our knowledge, the first framework to enable fully decentralized, label-free semantic segmentation training. FUSS introduces novel federation strategies that promote global consistency in feature and prototype space, jointly optimizing local segmentation heads and shared semantic centroids. Experiments on both benchmark and real-world datasets, including binary and multi-class segmentation tasks, show that FUSS consistently outperforms local-only client trainings as well as extensions of classical FL algorithms under varying client data distributions. To fully support reproducibility, the source code, data partitioning scripts, and implementation details are publicly available at: https://github.com/evanchar/FUSS

Federated Unsupervised Semantic Segmentation

TL;DR

FUSS (Federated Unsupervised image Semantic Segmentation) is proposed, to the authors' knowledge, the first framework to enable fully decentralized, label-free semantic segmentation training and introduces novel federation strategies that promote global consistency in feature and prototype space.

Abstract

This work explores the application of Federated Learning (FL) to Unsupervised Semantic image Segmentation (USS). Recent USS methods extract pixel-level features using frozen visual foundation models and refine them through self-supervised objectives that encourage semantic grouping. These features are then grouped to semantic clusters to produce segmentation masks. Extending these ideas to federated settings requires feature representation and cluster centroid alignment across distributed clients, an inherently difficult task under heterogeneous data distributions in the absence of supervision. To address this, we propose FUSS (Federated Unsupervised image Semantic Segmentation) which is, to our knowledge, the first framework to enable fully decentralized, label-free semantic segmentation training. FUSS introduces novel federation strategies that promote global consistency in feature and prototype space, jointly optimizing local segmentation heads and shared semantic centroids. Experiments on both benchmark and real-world datasets, including binary and multi-class segmentation tasks, show that FUSS consistently outperforms local-only client trainings as well as extensions of classical FL algorithms under varying client data distributions. To fully support reproducibility, the source code, data partitioning scripts, and implementation details are publicly available at: https://github.com/evanchar/FUSS

Paper Structure

This paper contains 30 sections, 23 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of the proposed FUSS framework. Each client performs local unsupervised training using self-supervised objectives and periodically transmits its learned segmentation head and semantic prototypes to a central server. The server aggregates these parameters and broadcasts the updated global model back to all clients. While traditional federated strategies may result in disorganized or overlapping class prototypes due to local heterogeneity, our proposed aggregation methods explicitly promote semantic alignment and prototype separability across the federation.
  • Figure 2: CocoStuff split visualization for 18 clients, based on dominant class frequency. (a) i.i.d. partition where dominant classes are distributed uniformly. (b) non-i.i.d. partition with Dirichlet concentration parameter $\alpha = 0.5$, where samples are concentrated on specific clients. Note the varying frequency scales (right) indicating the degree of sample concentration at each client.
  • Figure 3: Left: Discriminability analysis between resulting centroids for CocoStuff non-i.i.d. Right: t-SNE projections of centroids from final federated aggregation.
  • Figure 4: Pairwise distance structure for FedAvg (left) and FedAvg+FedCC (right). FedAvg+FedCC achieves greater inter-class distances across all 27 classes of CocoStuff.
  • Figure 5: Visual overview of the IPS dataset. Top: Sample validation images. Middle: Corresponding ground-truth binary segmentation masks. Bottom: Predicted masks produced by FUSS with FedCC (k-means) aggregation.
  • ...and 3 more figures