Table of Contents
Fetching ...

SCALES: Boost Binary Neural Network for Image Super-Resolution with Efficient Scalings

Renjie Wei, Zechun Liu, Yuchen Fan, Runsheng Wang, Ru Huang, Meng Li

TL;DR

This work addresses the performance gap between binary and full-precision image super-resolution by revealing that activation distributions in SR networks exhibit strong pixel-, channel-, layer-, and image-wise variations. It introduces SCALES, a binarization framework combining a layer-wise scaling factor, spatial re-scaling, and channel-wise re-scaling to capture these variations in an input-dependent manner. Across CNN and Transformer SR architectures, SCALES yields notable PSNR gains (e.g., >1 dB on binary Transformer SR) and substantial reductions in parameters and operations, enabling efficient mobile deployment. The approach demonstrates that preserving rich activation statistics during binarization can close much of the gap to FP models while delivering practical speedups and memory savings.

Abstract

Deep neural networks for image super-resolution (SR) have demonstrated superior performance. However, the large memory and computation consumption hinders their deployment on resource-constrained devices. Binary neural networks (BNNs), which quantize the floating point weights and activations to 1-bit can significantly reduce the cost. Although BNNs for image classification have made great progress these days, existing BNNs for SR still suffer from a large performance gap between the FP SR networks. To this end, we observe the activation distribution in SR networks and find much larger pixel-to-pixel, channel-to-channel, layer-to-layer, and image-to-image variation in the activation distribution than image classification networks. However, existing BNNs for SR fail to capture these variations that contain rich information for image reconstruction, leading to inferior performance. To address this problem, we propose SCALES, a binarization method for SR networks that consists of the layer-wise scaling factor, the spatial re-scaling method, and the channel-wise re-scaling method, capturing the layer-wise, pixel-wise, and channel-wise variations efficiently in an input-dependent manner. We evaluate our method across different network architectures and datasets. For CNN-based SR networks, our binarization method SCALES outperforms the prior art method by 0.2dB with fewer parameters and operations. With SCALES, we achieve the first accurate binary Transformer-based SR network, improving PSNR by more than 1dB compared to the baseline method.

SCALES: Boost Binary Neural Network for Image Super-Resolution with Efficient Scalings

TL;DR

This work addresses the performance gap between binary and full-precision image super-resolution by revealing that activation distributions in SR networks exhibit strong pixel-, channel-, layer-, and image-wise variations. It introduces SCALES, a binarization framework combining a layer-wise scaling factor, spatial re-scaling, and channel-wise re-scaling to capture these variations in an input-dependent manner. Across CNN and Transformer SR architectures, SCALES yields notable PSNR gains (e.g., >1 dB on binary Transformer SR) and substantial reductions in parameters and operations, enabling efficient mobile deployment. The approach demonstrates that preserving rich activation statistics during binarization can close much of the gap to FP models while delivering practical speedups and memory savings.

Abstract

Deep neural networks for image super-resolution (SR) have demonstrated superior performance. However, the large memory and computation consumption hinders their deployment on resource-constrained devices. Binary neural networks (BNNs), which quantize the floating point weights and activations to 1-bit can significantly reduce the cost. Although BNNs for image classification have made great progress these days, existing BNNs for SR still suffer from a large performance gap between the FP SR networks. To this end, we observe the activation distribution in SR networks and find much larger pixel-to-pixel, channel-to-channel, layer-to-layer, and image-to-image variation in the activation distribution than image classification networks. However, existing BNNs for SR fail to capture these variations that contain rich information for image reconstruction, leading to inferior performance. To address this problem, we propose SCALES, a binarization method for SR networks that consists of the layer-wise scaling factor, the spatial re-scaling method, and the channel-wise re-scaling method, capturing the layer-wise, pixel-wise, and channel-wise variations efficiently in an input-dependent manner. We evaluate our method across different network architectures and datasets. For CNN-based SR networks, our binarization method SCALES outperforms the prior art method by 0.2dB with fewer parameters and operations. With SCALES, we achieve the first accurate binary Transformer-based SR network, improving PSNR by more than 1dB compared to the baseline method.
Paper Structure (18 sections, 5 equations, 9 figures, 6 tables)

This paper contains 18 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The binary feature maps with our method SCALES and the prior art method E2FIF.
  • Figure 2: The typical architecture of CNN-based and Transformer-based SR networks and the detailed structure of basic blocks.
  • Figure 3: Activation distribution in EDSR lim2017enhanced.
  • Figure 4: Activation distribution in CNN-based and Transformer-based classification networks ResNet18 he2016deep and SwinViT liu2021swin. The distributions across pixels, channels, layers, and images are similar, thus we only show distributions across pixels here.
  • Figure 5: Activation distribution in SwinIR liang2021swinir.
  • ...and 4 more figures