Table of Contents
Fetching ...

The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention

Xingyu Ding, Lianlei Shan, Guiqin Zhao, Meiqi Wu, Wenzhang Zhou, Wei Li

TL;DR

This work tackles the challenge of deploying binary neural networks for dense prediction by addressing the upsampling and attention bottlenecks that typically degrade accuracy. It introduces a multi-branch parallel upsampling module and a binary attention mechanism to preserve discriminative power while maintaining the acceleration benefits of binarization. The approach binarizes weights and activations with learned scaling and uses a two-part attention framework to enable efficient, effective attention in a binary setting, achieving substantial speedups (up to about 137.9×) and strong accuracy on Cityscapes, KITTI Road, and ECSSD. Overall, the method demonstrates that carefully designed binary upsampling and attention can enable practical, high-performance dense prediction with significantly reduced storage and compute requirements.

Abstract

Deep learning-based information processing consumes long time and requires huge computing resources, especially for dense prediction tasks which require an output for each pixel, like semantic segmentation and salient object detection. There are mainly two challenges for quantization of dense prediction tasks. Firstly, directly applying the upsampling operation that dense prediction tasks require is extremely crude and causes unacceptable accuracy reduction. Secondly, the complex structure of dense prediction networks means it is difficult to maintain a fast speed as well as a high accuracy when performing quantization. In this paper, we propose an effective upsampling method and an efficient attention computation strategy to transfer the success of the binary neural networks (BNN) from single prediction tasks to dense prediction tasks. Firstly, we design a simple and robust multi-branch parallel upsampling structure to achieve the high accuracy. Then we further optimize the attention method which plays an important role in segmentation but has huge computation complexity. Our attention method can reduce the computational complexity by a factor of one hundred times but retain the original effect. Experiments on Cityscapes, KITTI road, and ECSSD fully show the effectiveness of our work.

The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention

TL;DR

This work tackles the challenge of deploying binary neural networks for dense prediction by addressing the upsampling and attention bottlenecks that typically degrade accuracy. It introduces a multi-branch parallel upsampling module and a binary attention mechanism to preserve discriminative power while maintaining the acceleration benefits of binarization. The approach binarizes weights and activations with learned scaling and uses a two-part attention framework to enable efficient, effective attention in a binary setting, achieving substantial speedups (up to about 137.9×) and strong accuracy on Cityscapes, KITTI Road, and ECSSD. Overall, the method demonstrates that carefully designed binary upsampling and attention can enable practical, high-performance dense prediction with significantly reduced storage and compute requirements.

Abstract

Deep learning-based information processing consumes long time and requires huge computing resources, especially for dense prediction tasks which require an output for each pixel, like semantic segmentation and salient object detection. There are mainly two challenges for quantization of dense prediction tasks. Firstly, directly applying the upsampling operation that dense prediction tasks require is extremely crude and causes unacceptable accuracy reduction. Secondly, the complex structure of dense prediction networks means it is difficult to maintain a fast speed as well as a high accuracy when performing quantization. In this paper, we propose an effective upsampling method and an efficient attention computation strategy to transfer the success of the binary neural networks (BNN) from single prediction tasks to dense prediction tasks. Firstly, we design a simple and robust multi-branch parallel upsampling structure to achieve the high accuracy. Then we further optimize the attention method which plays an important role in segmentation but has huge computation complexity. Our attention method can reduce the computational complexity by a factor of one hundred times but retain the original effect. Experiments on Cityscapes, KITTI road, and ECSSD fully show the effectiveness of our work.
Paper Structure (25 sections, 13 equations, 6 figures, 5 tables)

This paper contains 25 sections, 13 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The procedure of binarization. On the left is the feature map or the convolution kernel, where each number is 16-bit with full precision. After binarization, as shown on the right, the value is the binary with 1-bit.
  • Figure 2: Overview of the entire network structure. The whole network structure is consistent with U-Net unet. The encoding part is ResNet, and the decoding part adopts the structure of FPN fpn. In the decoding part, there are four large modules, each of which is composed of four convolution modules, one attention module, and one upsampling module. All operations are binarized.
  • Figure 3: Overview of the proposed upsampling module.
  • Figure 4: (a) is the feature map requiring up-sampling, (b) are the feature maps providing auxiliary information in our method, (c) is the result of full precision up-sampling, (d) is the result of binary up-sampling, and (e) is the result of our binary up-sampling. It can be seen that the results of our method are more similar to those of the full-precision method.
  • Figure 5: Overview of the proposed attention module.
  • ...and 1 more figures