Table of Contents
Fetching ...

DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation Model

Pasquale De Marinis, Pieter M. Blok, Uzay Kaymak, Rogier Brussee, Gennaro Vessio, Giovanna Castellano

TL;DR

This work tackles Cross-Domain Few-Shot Semantic Segmentation by eliminating test-time dependence on support images through a teacher–student distillation approach. A DCAMA-based teacher learns from the support set, while a lightweight ConvDist student mimics the teacher’s attention to enable support-free inference, with an additional target-domain TransferFSS fine-tuning step. A new multi-domain CD-FSS benchmark demonstrates DistillFSS’s competitive accuracy, especially in multi-class and multi-shot scenarios, while offering substantial efficiency gains and scalability. The approach enables rapid adaptation to unseen domains and supports deployment in resource-constrained settings, marking a practical advance for cross-domain segmentation under limited annotations.

Abstract

Cross-Domain Few-Shot Semantic Segmentation (CD-FSS) seeks to segment unknown classes in unseen domains using only a few annotated examples. This setting is inherently challenging: source and target domains exhibit substantial distribution shifts, label spaces are disjoint, and support images are scarce--making standard episodic methods unreliable and computationally demanding at test time. To address these constraints, we propose DistillFSS, a framework that embeds support-set knowledge directly into a model's parameters through a teacher--student distillation process. By internalizing few-shot reasoning into a dedicated layer within the student network, DistillFSS eliminates the need for support images at test time, enabling fast, lightweight inference, while allowing efficient extension to novel classes in unseen domains through rapid teacher-driven specialization. Combined with fine-tuning, the approach scales efficiently to large support sets and significantly reduces computational overhead. To evaluate the framework under realistic conditions, we introduce a new CD-FSS benchmark spanning medical imaging, industrial inspection, and remote sensing, with disjoint label spaces and variable support sizes. Experiments show that DistillFSS matches or surpasses state-of-the-art baselines, particularly in multi-class and multi-shot scenarios, while offering substantial efficiency gains. The code is available at https://github.com/pasqualedem/DistillFSS.

DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation Model

TL;DR

This work tackles Cross-Domain Few-Shot Semantic Segmentation by eliminating test-time dependence on support images through a teacher–student distillation approach. A DCAMA-based teacher learns from the support set, while a lightweight ConvDist student mimics the teacher’s attention to enable support-free inference, with an additional target-domain TransferFSS fine-tuning step. A new multi-domain CD-FSS benchmark demonstrates DistillFSS’s competitive accuracy, especially in multi-class and multi-shot scenarios, while offering substantial efficiency gains and scalability. The approach enables rapid adaptation to unseen domains and supports deployment in resource-constrained settings, marking a practical advance for cross-domain segmentation under limited annotations.

Abstract

Cross-Domain Few-Shot Semantic Segmentation (CD-FSS) seeks to segment unknown classes in unseen domains using only a few annotated examples. This setting is inherently challenging: source and target domains exhibit substantial distribution shifts, label spaces are disjoint, and support images are scarce--making standard episodic methods unreliable and computationally demanding at test time. To address these constraints, we propose DistillFSS, a framework that embeds support-set knowledge directly into a model's parameters through a teacher--student distillation process. By internalizing few-shot reasoning into a dedicated layer within the student network, DistillFSS eliminates the need for support images at test time, enabling fast, lightweight inference, while allowing efficient extension to novel classes in unseen domains through rapid teacher-driven specialization. Combined with fine-tuning, the approach scales efficiently to large support sets and significantly reduces computational overhead. To evaluate the framework under realistic conditions, we introduce a new CD-FSS benchmark spanning medical imaging, industrial inspection, and remote sensing, with disjoint label spaces and variable support sizes. Experiments show that DistillFSS matches or surpasses state-of-the-art baselines, particularly in multi-class and multi-shot scenarios, while offering substantial efficiency gains. The code is available at https://github.com/pasqualedem/DistillFSS.

Paper Structure

This paper contains 25 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The proposed DistillFSS framework consists of two main components: a student network and a teacher network. The student network is trained on the output of the DCAMA attention block (used as the base model), while the teacher network is designed to focus on the support set. Although the student network has no direct access to the support set, it implicitly encodes its information through the distillation process. The ice icon denotes frozen layers, while the fire icon indicates trainable layers.
  • Figure 2: Example of a DCAMA prediction on the WeedMap dataset, showing two attention maps for the crop class taken from different depths in the network. Level A comes from an earlier stage (low-level features), while level B corresponds to a later stage (high-level features). Although level A is more accurate, the model assigns higher importance to level B, leading to an incorrect final prediction.
  • Figure 3: Overview of the distillation process in DistillFSS for a single scale. A DCAMA attention block is distilled into a lightweight ConvDist module, which replicates its behavior without requiring support images at inference. Knowledge transfer is enforced through the distillation loss $L_{dist}$.
  • Figure 4: Qualitative results across various selected datasets, listed from top to bottom: Lung Nodule, ISIC, KVASIR-Seg, Nucleus, WeedMap, Pothole-mix, and Industrial-$5^{i}$.
  • Figure 5: Peak memory consumption (MiB) during the forward pass across models in the 1-way setting.
  • ...and 1 more figures