Table of Contents
Fetching ...

Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels

Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yinghui Gao, Biao Li, Ping Zhong

TL;DR

This paper proposes a lightweight upsampling operation, termed Dynamic Lightweight Upsampling (DLU), which first constructs a small-scale source kernel space, and then samples the large-scale kernels from the kernel space by introducing learnable guidance offsets, hence avoiding introducing a large collection of trainable parameters in upsampling.

Abstract

As a fundamental operation in modern machine vision models, feature upsampling has been widely used and investigated in the literatures. An ideal upsampling operation should be lightweight, with low computational complexity. That is, it can not only improve the overall performance but also not affect the model complexity. Content-aware Reassembly of Features (CARAFE) is a well-designed learnable operation to achieve feature upsampling. Albeit encouraging performance achieved, this method requires generating large-scale kernels, which brings a mass of extra redundant parameters, and inherently has limited scalability. To this end, we propose a lightweight upsampling operation, termed Dynamic Lightweight Upsampling (DLU) in this paper. In particular, it first constructs a small-scale source kernel space, and then samples the large-scale kernels from the kernel space by introducing learnable guidance offsets, hence avoiding introducing a large collection of trainable parameters in upsampling. Experiments on several mainstream vision tasks show that our DLU achieves comparable and even better performance to the original CARAFE, but with much lower complexity, e.g., DLU requires 91% fewer parameters and at least 63% fewer FLOPs (Floating Point Operations) than CARAFE in the case of 16x upsampling, but outperforms the CARAFE by 0.3% mAP in object detection. Code is available at https://github.com/Fu0511/Dynamic-Lightweight-Upsampling.

Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels

TL;DR

This paper proposes a lightweight upsampling operation, termed Dynamic Lightweight Upsampling (DLU), which first constructs a small-scale source kernel space, and then samples the large-scale kernels from the kernel space by introducing learnable guidance offsets, hence avoiding introducing a large collection of trainable parameters in upsampling.

Abstract

As a fundamental operation in modern machine vision models, feature upsampling has been widely used and investigated in the literatures. An ideal upsampling operation should be lightweight, with low computational complexity. That is, it can not only improve the overall performance but also not affect the model complexity. Content-aware Reassembly of Features (CARAFE) is a well-designed learnable operation to achieve feature upsampling. Albeit encouraging performance achieved, this method requires generating large-scale kernels, which brings a mass of extra redundant parameters, and inherently has limited scalability. To this end, we propose a lightweight upsampling operation, termed Dynamic Lightweight Upsampling (DLU) in this paper. In particular, it first constructs a small-scale source kernel space, and then samples the large-scale kernels from the kernel space by introducing learnable guidance offsets, hence avoiding introducing a large collection of trainable parameters in upsampling. Experiments on several mainstream vision tasks show that our DLU achieves comparable and even better performance to the original CARAFE, but with much lower complexity, e.g., DLU requires 91% fewer parameters and at least 63% fewer FLOPs (Floating Point Operations) than CARAFE in the case of 16x upsampling, but outperforms the CARAFE by 0.3% mAP in object detection. Code is available at https://github.com/Fu0511/Dynamic-Lightweight-Upsampling.

Paper Structure

This paper contains 20 sections, 4 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Intuitive comparison of CARAFE wang2019carafe and DLU. Unlike CARAFE needs to generate independent kernels (yellow ones) for all the positions of the upsampling output, our DLU generates a portion (yellow ones), and samples others (red ones) with learnable offsets through interpolation.
  • Figure 2: The overall framework of our DLU.
  • Figure 3: Illustration of the expansion of source kernel space. In this figure, each kernel in the source space (orange) generates 4 newborn kernels (pink, blue, purple, and yellow) in the case of 2$\times$ upsampling, and the newborn kernels are sampled from the source kernel space using bilinear interpolation channel-wisely.
  • Figure 4: FPN architecture with the upsampling operator as DLU.
  • Figure 5: Comparison of detection results of FPN lin2017feature with different upsampling methods on the validation set of COCO 2017. The true positives, false positives, and false negatives are indicated by green, blue, and red rectangles, respectively. "BI" represent for Bilinear Interpolation.
  • ...and 3 more figures