ActiveFreq: Integrating Active Learning and Frequency Domain Analysis for Interactive Segmentation

Lijun Guo; Qian Zhou; Zidi Shi; Hua Zou; Gang Ke

ActiveFreq: Integrating Active Learning and Frequency Domain Analysis for Interactive Segmentation

Lijun Guo, Qian Zhou, Zidi Shi, Hua Zou, Gang Ke

Abstract

Interactive segmentation is commonly used in medical image analysis to obtain precise, pixel-level labeling, typically involving iterative user input to correct mislabeled regions. However, existing approaches often fail to fully utilize user knowledge from interactive inputs and achieve comprehensive feature extraction. Specifically, these methods tend to treat all mislabeled regions equally, selecting them randomly for refinement without evaluating each region's potential impact on segmentation quality. Additionally, most models rely solely on spatial domain features, overlooking frequency domain information that could enhance feature extraction and improve performance. To address these limitations, we propose ActiveFreq, a novel interactive segmentation framework that integrates active learning and frequency domain analysis to minimize human intervention while achieving high-quality labeling. ActiveFreq introduces AcSelect, an autonomous module that prioritizes the most informative mislabeled regions, ensuring maximum performance gain from each click. Moreover, we develop FreqFormer, a segmentation backbone incorporating a Fourier transform module to map features from the spatial to the frequency domain, enabling richer feature extraction. Evaluations on the ISIC-2017 and OAI-ZIB datasets demonstrate that ActiveFreq achieves high performance with reduced user interaction, achieving 3.74 NoC@90 on ISIC-2017 and 9.27 NoC@90 on OAI-ZIB, with 23.5% and 12.8% improvements over previous best results, respectively. Under minimal input conditions, such as two clicks, ActiveFreq reaches mIoU scores of 85.29% and 75.76% on ISIC-2017 and OAI-ZIB, highlighting its efficiency and accuracy in interactive medical segmentation.

ActiveFreq: Integrating Active Learning and Frequency Domain Analysis for Interactive Segmentation

Abstract

Paper Structure (27 sections, 15 equations, 8 figures, 7 tables)

This paper contains 27 sections, 15 equations, 8 figures, 7 tables.

Introduction
Related Work
Mask Refinement in Interactive Segmentation
Active Learning for Semantic Segmentation
Frequency Domain Analysis
Method
AcSelect using Active Learning
Maximum Pixel Entropy
Average Pixel Entropy
Regional Group Uncertainty
Mislabeled Region Selection
Network Architecture of FreqFormer
SegFormer Encoder
FreqNet Decoder
Frequency Analysis Module
...and 12 more sections

Figures (8)

Figure 1: Comparison between our proposed method and other methods. While other methods randomly select a mislabeled region to click, our approach uses an active learning selection module (AcSelect) to evaluate all mislabeled areas and select the most valuable one for annotation.
Figure 2: The overall pipeline of ActiveFreq. (a) Coarse segmentation is generated using the proposed FreqFormer. (b) Mislabeled regions within the coarse mask. (c) The AcSelect module selects the most informative region for refinement. This process evaluates each mislabeled region using three metrics: Maximum Pixel Entropy (MPE), Average Pixel Entropy (APE), and Regional Group Uncertainty (RGU). A Region Score (RS) is calculated for each region, and the region with the highest score is chosen for further refinement to produce the refined mask.
Figure 3: The diagram of FreqFormer. (a) SegFormer encoder extracts multi-scale features from the input image and user clicks. (b) FreqNet decoder, composed of four Freq layers, refines these features to produce a coarse mask. (c) Transformer Block architecture, where FFN means feed-forward network. (d) MLP Layer aligns feature maps for final concatenation. (e) Freq Block architecture in the Freq layer, in which FreqM represents the proposed frequency analysis module.
Figure 4: The architecture of frequency analysis module (FreqModule). The input feature map is split into four sub-feature maps: one is processed with depth-wise convolution (DW Conv) for spatial information, while the other three pass through 2D Discrete Fourier Transform (DFT) and 2D Inverse DFT (IDFT) for frequency information. The outputs are then concatenated to produce the final result.
Figure 5: Comparison of mIoU@2 performance between our proposed AcSelect and other state-of-the-art methods on two datasets: (a) Transformer-based approaches on ISIC-2017 berseth2017isic and (b) CNN-based methods on OAI-ZIB ambellan2019oaizib.
...and 3 more figures

ActiveFreq: Integrating Active Learning and Frequency Domain Analysis for Interactive Segmentation

Abstract

ActiveFreq: Integrating Active Learning and Frequency Domain Analysis for Interactive Segmentation

Authors

Abstract

Table of Contents

Figures (8)