FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation

Ding-Ruei Shen

FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation

Ding-Ruei Shen

TL;DR

FFREEDG tackles semantic segmentation under strict privacy constraints by combining a source-free federated setting with vision–language priors from CLIP. Frieren enables centralized pretraining on labeled source data, followed by federated adaptation where clients contribute unlabeled data, leveraging weak-to-strong consistency, dense CLIP distillation, and a language-guided decoder. The method demonstrates competitive performance against domain generalization and domain adaptation baselines on Cityscapes→ACDC and GTA5→Cityscapes, with FedSWA providing stability in unsupervised federation. This work advances practical privacy-preserving segmentation by showing how foundation-model priors and unified semi-/unsupervised learning can generalize to unseen domains without accessing source or target labels. It lays a foundation for future integration of larger vision-language models and alternative decoders to further close the gap to state-of-the-art DG/DA methods.

Abstract

Federeated Learning (FL) offers a privacy-preserving solution for Semantic Segmentation (SS) tasks to adapt to new domains, but faces significant challenges from these domain shifts, particularly when client data is unlabeled. However, most existing FL methods unrealistically assume access to labeled data on remote clients or fail to leverage the power of modern Vision Foundation Models (VFMs). Here, we propose a novel and challenging task, FFREEDG, in which a model is pretrained on a server's labeled source dataset and subsequently trained across clients using only their unlabeled data, without ever re-accessing the source. To solve FFREEDG, we propose FRIEREN, a framework that leverages the knowledge of a VFM by integrating vision and language modalities. Our approach employs a Vision-Language decoder guided by CLIP-based text embeddings to improve semantic disambiguation and uses a weak-to-strong consistency learning strategy for robust local training on pseudo-labels. Our experiments on synthetic-to-real and clear-to-adverse-weather benchmarks demonstrate that our framework effectively tackles this new task, achieving competitive performance against established domain generalization and adaptation methods and setting a strong baseline for future research.

FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation

TL;DR

Abstract

FRIEREN: Federated Learning with Vision-Language Regularization for Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)