Table of Contents
Fetching ...

FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures

Hao Lu, Wenze Liu, Hongtao Fu, Zhiguo Cao

TL;DR

FADE tackles the challenge of a truly task-agnostic upsampling operator for dense prediction by fusing encoder and decoder features to generate content-aware upsampling kernels. Its semi-shift convolution unifies interpolation, channel compression, and kernel generation, while a decoder-dependent gate selectively passes high-resolution encoder details to refine edges. Across semantic segmentation, image matting, object/detection, instance segmentation, and depth estimation, FADE demonstrates consistent improvements over fixed and prior dynamic upsampling methods, with a lightweight variant (FADE-Lite) maintaining strong performance at reduced cost. The method shifts the design focus to high-quality upsampling as a general-purpose component rather than task-specific tailoring, potentially influencing future encoder–decoder architectures and the development of vision foundation models. Overall, FADE provides robust, generalizable upsampling that improves region coherence and boundary delineation with practical efficiency considerations.

Abstract

The goal of this work is to develop a task-agnostic feature upsampling operator for dense prediction where the operator is required to facilitate not only region-sensitive tasks like semantic segmentation but also detail-sensitive tasks such as image matting. Prior upsampling operators often can work well in either type of the tasks, but not both. We argue that task-agnostic upsampling should dynamically trade off between semantic preservation and detail delineation, instead of having a bias between the two properties. In this paper, we present FADE, a novel, plug-and-play, lightweight, and task-agnostic upsampling operator by fusing the assets of decoder and encoder features at three levels: i) considering both the encoder and decoder feature in upsampling kernel generation; ii) controlling the per-point contribution of the encoder/decoder feature in upsampling kernels with an efficient semi-shift convolutional operator; and iii) enabling the selective pass of encoder features with a decoder-dependent gating mechanism for compensating details. To improve the practicality of FADE, we additionally study parameter- and memory-efficient implementations of semi-shift convolution. We analyze the upsampling behavior of FADE on toy data and show through large-scale experiments that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks with little extra cost. For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully. Code is made available at: https://github.com/poppinace/fade

FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures

TL;DR

FADE tackles the challenge of a truly task-agnostic upsampling operator for dense prediction by fusing encoder and decoder features to generate content-aware upsampling kernels. Its semi-shift convolution unifies interpolation, channel compression, and kernel generation, while a decoder-dependent gate selectively passes high-resolution encoder details to refine edges. Across semantic segmentation, image matting, object/detection, instance segmentation, and depth estimation, FADE demonstrates consistent improvements over fixed and prior dynamic upsampling methods, with a lightweight variant (FADE-Lite) maintaining strong performance at reduced cost. The method shifts the design focus to high-quality upsampling as a general-purpose component rather than task-specific tailoring, potentially influencing future encoder–decoder architectures and the development of vision foundation models. Overall, FADE provides robust, generalizable upsampling that improves region coherence and boundary delineation with practical efficiency considerations.

Abstract

The goal of this work is to develop a task-agnostic feature upsampling operator for dense prediction where the operator is required to facilitate not only region-sensitive tasks like semantic segmentation but also detail-sensitive tasks such as image matting. Prior upsampling operators often can work well in either type of the tasks, but not both. We argue that task-agnostic upsampling should dynamically trade off between semantic preservation and detail delineation, instead of having a bias between the two properties. In this paper, we present FADE, a novel, plug-and-play, lightweight, and task-agnostic upsampling operator by fusing the assets of decoder and encoder features at three levels: i) considering both the encoder and decoder feature in upsampling kernel generation; ii) controlling the per-point contribution of the encoder/decoder feature in upsampling kernels with an efficient semi-shift convolutional operator; and iii) enabling the selective pass of encoder features with a decoder-dependent gating mechanism for compensating details. To improve the practicality of FADE, we additionally study parameter- and memory-efficient implementations of semi-shift convolution. We analyze the upsampling behavior of FADE on toy data and show through large-scale experiments that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks with little extra cost. For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully. Code is made available at: https://github.com/poppinace/fade
Paper Structure (39 sections, 4 equations, 12 figures, 12 tables)

This paper contains 39 sections, 4 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: Inferred segmentation masks and alpha mattes with different upsampling operators. The compared operators include IndexNet lu2019indices, A2U dai2021learning, CARAFE jiaqi2019carafe, and our proposed FADE. Among competitors, only FADE generates both the high-quality mask and the alpha matte.
  • Figure 2: Main difference between dynamic upsampling operators on the use of encoder and/or decoder features. (a) CARAFE jiaqi2019carafe generates upsampling kernels conditioned on decoder features, while (b) IndexNet lu2022index and A2U dai2021learning generate kernels using encoder features only. By contrast, (c) FADE considers both encoder and decoder features in upsampling kernel generation.
  • Figure 3: Naive implementation for generating upsampling kernels using encoder and decoder features. The kernel prediction using high-res encoder and low-res decoder features requires matching resolution with explicit feature interpolation and concatenation, followed by channel compression and convolution.
  • Figure 4: Technical pipeline of FADE. From (b) the overview of FADE, FADE upsamples the low-res decoder feature with the help of the high-res encoder features. The two types of features are fed into two key modules. In (a) dynamic feature upsampling, the features are used to generate upsampling kernels using a semi-shift convolutional operator (Fig. \ref{['fig:semi-shift_conv']}). The kernels are then applied to the decoder feature to generate the upsampled feature. In (c) gated feature refinement, the encoder and upsampled features are modulated by a decoder-dependent gating mechanism to enhance detail delineation before outputting the final refined feature.
  • Figure 5: Visualizations of inferred mask and reconstructed results on SUN RGBD and Fashion-MNIST. The decoder-only model generates semantically consistent mask predictions but poor reconstructions, while the encoder-only one is on the contrary. When both encoder and decoder features are considered, the model generates reasonable masks as the decoder-only model and clear reconstructions as the encoder-only one (cf. the table lamp and the stripes on clothes).
  • ...and 7 more figures

Theorems & Definitions (3)

  • remark thmcounterremark
  • remark thmcounterremark
  • remark thmcounterremark