Table of Contents
Fetching ...

Frequency-aware Feature Fusion for Dense Image Prediction

Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, Gao Huang

TL;DR

Dense image prediction demands both strong category information and precise spatial boundaries. FreqFusion introduces Adaptive Low-Pass Filters, an Offset generator, and Adaptive High-Pass Filters to achieve feature consistency and sharp boundaries during fusion, guided by a quantitative feature-similarity framework. Across semantic segmentation, object detection, instance segmentation, and panoptic segmentation, the method yields consistent improvements on benchmarks such as Cityscapes, ADE20K, and COCO, with ablations validating the contributions of each component. The approach provides a practical, architecture-agnostic enhancement to upsampling and fusion with broad applicability to dense prediction tasks.

Abstract

Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, resulting in intra-category inconsistency due to disturbed high-frequency features. Additionally, blurred boundaries in fused features lack accurate high frequency, leading to boundary displacement. Building upon these observations, we propose Frequency-Aware Feature Fusion (FreqFusion), integrating an Adaptive Low-Pass Filter (ALPF) generator, an offset generator, and an Adaptive High-Pass Filter (AHPF) generator. The ALPF generator predicts spatially-variant low-pass filters to attenuate high-frequency components within objects, reducing intra-class inconsistency during upsampling. The offset generator refines large inconsistent features and thin boundaries by replacing inconsistent features with more consistent ones through resampling, while the AHPF generator enhances high-frequency detailed boundary information lost during downsampling. Comprehensive visualization and quantitative analysis demonstrate that FreqFusion effectively improves feature consistency and sharpens object boundaries. Extensive experiments across various dense prediction tasks confirm its effectiveness. The code is made publicly available at https://github.com/Linwei-Chen/FreqFusion.

Frequency-aware Feature Fusion for Dense Image Prediction

TL;DR

Dense image prediction demands both strong category information and precise spatial boundaries. FreqFusion introduces Adaptive Low-Pass Filters, an Offset generator, and Adaptive High-Pass Filters to achieve feature consistency and sharp boundaries during fusion, guided by a quantitative feature-similarity framework. Across semantic segmentation, object detection, instance segmentation, and panoptic segmentation, the method yields consistent improvements on benchmarks such as Cityscapes, ADE20K, and COCO, with ablations validating the contributions of each component. The approach provides a practical, architecture-agnostic enhancement to upsampling and fusion with broad applicability to dense prediction tasks.

Abstract

Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, resulting in intra-category inconsistency due to disturbed high-frequency features. Additionally, blurred boundaries in fused features lack accurate high frequency, leading to boundary displacement. Building upon these observations, we propose Frequency-Aware Feature Fusion (FreqFusion), integrating an Adaptive Low-Pass Filter (ALPF) generator, an offset generator, and an Adaptive High-Pass Filter (AHPF) generator. The ALPF generator predicts spatially-variant low-pass filters to attenuate high-frequency components within objects, reducing intra-class inconsistency during upsampling. The offset generator refines large inconsistent features and thin boundaries by replacing inconsistent features with more consistent ones through resampling, while the AHPF generator enhances high-frequency detailed boundary information lost during downsampling. Comprehensive visualization and quantitative analysis demonstrate that FreqFusion effectively improves feature consistency and sharpens object boundaries. Extensive experiments across various dense prediction tasks confirm its effectiveness. The code is made publicly available at https://github.com/Linwei-Chen/FreqFusion.
Paper Structure (28 sections, 13 equations, 15 figures, 15 tables)

This paper contains 28 sections, 13 equations, 15 figures, 15 tables.

Figures (15)

  • Figure 1: Feature intra-category similarity (IntraSim) and prediction visualization. The brighter color indicates a higher IntraSim for the bus (left two columns) and truck (right two columns). The standard feature fusion demonstrates low IntraSim within objects and at their boundaries. We observe rapid changes or variations in feature values within objects, i.e., disturbed high-frequency features leading to relatively low intra-category similarity 2022frequencysimilarity and resulting in intra-category inconsistency. Furthermore, the blurred boundary lacks accurate high frequency, leading to boundary displacement. The proposed FreqFusion shows more consistent features and clear boundaries, contributing to more consistent prediction with finer boundaries.
  • Figure 2: Illustration of intra-category similarity, inter-category similarity, and similarity margin. Different colors indicate different categories.
  • Figure 3: The illustration of FreqFusion. Pixel unshuffle involves resizing the spatial dimensions of the feature by half and expanding the channel by a factor of 4$\times$, dividing them into 4 groups, such as from $C\times 2H \times 2W$ to $4\times C\times H\times W$. Pixel shuffle 2016pixelshuffle is the reverse operation, transitioning from $4\times C\times H\times W$ to $C\times 2H \times 2W$. The Adaptive Low-Pass Filter (ALPF) generator and Adaptive High-Pass in the initial fusion share the same parameters as those in the final fusion.
  • Figure 4: The illustration of generators in FreqFusion. $\otimes$ represents element-wise multiplication, and $\ominus$ represents subtraction.
  • Figure 5: Frequency analysis of the learned convolutional kernel in the ALHPF generator is presented. In (a), the nine learned kernels for generating $3\times 3$ spatial-variant low-pass filters are displayed. A brighter color indicates a higher learned weight. (b) illustrates their corresponding Fourier-transformed kernels. To further analyze their characteristics, we average their frequency amplitudes and present the frequency spectrum in (c), demonstrating higher power for high-frequency components, indicating reliance on high-frequency components in the feature for filter prediction.
  • ...and 10 more figures