Table of Contents
Fetching ...

Efficient Multitask Dense Predictor via Binarization

Yuzhang Shang, Dan Xu, Gaowen Liu, Ramana Rao Kompella, Yan Yan

TL;DR

The paper tackles the computational burden of dense multitask prediction by binarizing the Multitask Dense Predictor, creating Bi-MTDP. It identifies information-flow degradation as the core challenge of binarization in dense models and counters it with a variational information bottleneck and feature-based knowledge distillation. The authors propose two variants, Bi-MTDP-C and Bi-MTDP-F, including a binarized cross-talk pathway and information-flow calibration, achieving state-of-the-art or competitive results on NYUD-v2 and PASCAL-Context with significant efficiency gains. This work demonstrates that carefully designed information-flow remedies can unlock the practical use of binary networks for complex vision tasks and provides publicly available code.

Abstract

Multi-task learning for dense prediction has emerged as a pivotal area in computer vision, enabling simultaneous processing of diverse yet interrelated pixel-wise prediction tasks. However, the substantial computational demands of state-of-the-art (SoTA) models often limit their widespread deployment. This paper addresses this challenge by introducing network binarization to compress resource-intensive multi-task dense predictors. Specifically, our goal is to significantly accelerate multi-task dense prediction models via Binary Neural Networks (BNNs) while maintaining and even improving model performance at the same time. To reach this goal, we propose a Binary Multi-task Dense Predictor, Bi-MTDP, and several variants of Bi-MTDP, in which a multi-task dense predictor is constructed via specified binarized modules. Our systematical analysis of this predictor reveals that performance drop from binarization is primarily caused by severe information degradation. To address this issue, we introduce a deep information bottleneck layer that enforces representations for downstream tasks satisfying Gaussian distribution in forward propagation. Moreover, we introduce a knowledge distillation mechanism to correct the direction of information flow in backward propagation. Intriguingly, one variant of Bi-MTDP outperforms full-precision (FP) multi-task dense prediction SoTAs, ARTC (CNN-based) and InvPT (ViT-Based). This result indicates that Bi-MTDP is not merely a naive trade-off between performance and efficiency, but is rather a benefit of the redundant information flow thanks to the multi-task architecture. Code is available at https://github.com/42Shawn/BiMTDP.

Efficient Multitask Dense Predictor via Binarization

TL;DR

The paper tackles the computational burden of dense multitask prediction by binarizing the Multitask Dense Predictor, creating Bi-MTDP. It identifies information-flow degradation as the core challenge of binarization in dense models and counters it with a variational information bottleneck and feature-based knowledge distillation. The authors propose two variants, Bi-MTDP-C and Bi-MTDP-F, including a binarized cross-talk pathway and information-flow calibration, achieving state-of-the-art or competitive results on NYUD-v2 and PASCAL-Context with significant efficiency gains. This work demonstrates that carefully designed information-flow remedies can unlock the practical use of binary networks for complex vision tasks and provides publicly available code.

Abstract

Multi-task learning for dense prediction has emerged as a pivotal area in computer vision, enabling simultaneous processing of diverse yet interrelated pixel-wise prediction tasks. However, the substantial computational demands of state-of-the-art (SoTA) models often limit their widespread deployment. This paper addresses this challenge by introducing network binarization to compress resource-intensive multi-task dense predictors. Specifically, our goal is to significantly accelerate multi-task dense prediction models via Binary Neural Networks (BNNs) while maintaining and even improving model performance at the same time. To reach this goal, we propose a Binary Multi-task Dense Predictor, Bi-MTDP, and several variants of Bi-MTDP, in which a multi-task dense predictor is constructed via specified binarized modules. Our systematical analysis of this predictor reveals that performance drop from binarization is primarily caused by severe information degradation. To address this issue, we introduce a deep information bottleneck layer that enforces representations for downstream tasks satisfying Gaussian distribution in forward propagation. Moreover, we introduce a knowledge distillation mechanism to correct the direction of information flow in backward propagation. Intriguingly, one variant of Bi-MTDP outperforms full-precision (FP) multi-task dense prediction SoTAs, ARTC (CNN-based) and InvPT (ViT-Based). This result indicates that Bi-MTDP is not merely a naive trade-off between performance and efficiency, but is rather a benefit of the redundant information flow thanks to the multi-task architecture. Code is available at https://github.com/42Shawn/BiMTDP.
Paper Structure (12 sections, 6 equations, 7 figures, 3 tables)

This paper contains 12 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (Left) A conceptual illustration of binary dense predictions in a multi-task manner. In contrast to approaching a series of relevant tasks individually, the multitask model benefits from information supplementation among different tasks via cross-talk structures, but the cumbersome cross-talk modules also add additional computational burden. (Right) Performance summary on NYUD-v2. X-axis and Y-axis denote the performance on depth estimation (lower is better) and segmentation (higher is better), respectively. Size of dots denotes FLOPs. ATRC bruggemann2021exploring and InvPT ye2022inverted are previous CNN-based and ViT-based SoTAs, respectively.
  • Figure 2: A general illustration of the forward propagation of the $k$-th layer in the BNN.
  • Figure 3: (Left) The illustration of the baseline multitask framework. (Middle) The designed MMD modules for binary representations. Importantly, the MMD module can pass information among different predictions, acting as a cross-talk mechanism. (Right) As all fundamental modules in Bi-MTDP baseline are binarized, inferences can be performed by complete Bool operations, which are very computationally cheap.
  • Figure 4: The pipeline of Bi-MTDP. We introduce a VIB layer after the backbone network to filter-out the nuisance factors which may lead to model overfitting issue in the forward propagation. In addition, we deploy a feature-based knowledge distillation mechanism to guide the optimization direction in the backward propagation.
  • Figure 5: (a) Grad-cam visualization of feature maps of different multitask dense prediction methods. (b) t-SNE visualization of learned features of all 20 classes on Pascal-Context. (c) Centered Kernel Alignment analyzing the information flow within different networks.
  • ...and 2 more figures