Table of Contents
Fetching ...

FreqU-FNet: Frequency-Aware U-Net for Imbalanced Medical Image Segmentation

Ruiqi Xing

TL;DR

FreqU-FNet introduces a frequency-domain U-Net variant for imbalanced medical image segmentation by integrating a Frequency Domain Encoder that uses Daubechies wavelet downsampling and Fourier low-pass filtering, a Spatial Learnable Decoder with adaptive multi-branch upsampling, and a Frequency-Aware Loss to emphasize high-frequency structures in minority classes. The architecture fuses spectral and spatial cues through native-space and space-channel sampling pathways, guided by learnable fusion weights and a spatial auxiliary learning module. Empirical results on MSD Prostate, Pancreas, and Lung datasets demonstrate superior performance over CNN and Transformer baselines, particularly reducing Dice gaps between majority and minority classes. The study highlights the practical impact of reducing frequency aliasing and enhancing boundary precision for clinically important but underrepresented anatomical structures.

Abstract

Medical image segmentation faces persistent challenges due to severe class imbalance and the frequency-specific distribution of anatomical structures. Most conventional CNN-based methods operate in the spatial domain and struggle to capture minority class signals, often affected by frequency aliasing and limited spectral selectivity. Transformer-based models, while powerful in modeling global dependencies, tend to overlook critical local details necessary for fine-grained segmentation. To overcome these limitations, we propose FreqU-FNet, a novel U-shaped segmentation architecture operating in the frequency domain. Our framework incorporates a Frequency Encoder that leverages Low-Pass Frequency Convolution and Daubechies wavelet-based downsampling to extract multi-scale spectral features. To reconstruct fine spatial details, we introduce a Spatial Learnable Decoder (SLD) equipped with an adaptive multi-branch upsampling strategy. Furthermore, we design a frequency-aware loss (FAL) function to enhance minority class learning. Extensive experiments on multiple medical segmentation benchmarks demonstrate that FreqU-FNet consistently outperforms both CNN and Transformer baselines, particularly in handling under-represented classes, by effectively exploiting discriminative frequency bands.

FreqU-FNet: Frequency-Aware U-Net for Imbalanced Medical Image Segmentation

TL;DR

FreqU-FNet introduces a frequency-domain U-Net variant for imbalanced medical image segmentation by integrating a Frequency Domain Encoder that uses Daubechies wavelet downsampling and Fourier low-pass filtering, a Spatial Learnable Decoder with adaptive multi-branch upsampling, and a Frequency-Aware Loss to emphasize high-frequency structures in minority classes. The architecture fuses spectral and spatial cues through native-space and space-channel sampling pathways, guided by learnable fusion weights and a spatial auxiliary learning module. Empirical results on MSD Prostate, Pancreas, and Lung datasets demonstrate superior performance over CNN and Transformer baselines, particularly reducing Dice gaps between majority and minority classes. The study highlights the practical impact of reducing frequency aliasing and enhancing boundary precision for clinically important but underrepresented anatomical structures.

Abstract

Medical image segmentation faces persistent challenges due to severe class imbalance and the frequency-specific distribution of anatomical structures. Most conventional CNN-based methods operate in the spatial domain and struggle to capture minority class signals, often affected by frequency aliasing and limited spectral selectivity. Transformer-based models, while powerful in modeling global dependencies, tend to overlook critical local details necessary for fine-grained segmentation. To overcome these limitations, we propose FreqU-FNet, a novel U-shaped segmentation architecture operating in the frequency domain. Our framework incorporates a Frequency Encoder that leverages Low-Pass Frequency Convolution and Daubechies wavelet-based downsampling to extract multi-scale spectral features. To reconstruct fine spatial details, we introduce a Spatial Learnable Decoder (SLD) equipped with an adaptive multi-branch upsampling strategy. Furthermore, we design a frequency-aware loss (FAL) function to enhance minority class learning. Extensive experiments on multiple medical segmentation benchmarks demonstrate that FreqU-FNet consistently outperforms both CNN and Transformer baselines, particularly in handling under-represented classes, by effectively exploiting discriminative frequency bands.

Paper Structure

This paper contains 22 sections, 15 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: The Structure of Component of Frequency domain-based Encoder and Spatial Learnable Decoder. The Encoder component shows both the convolution and downsampling process. Structure of Decoder mainly shows the upsampling component.