Table of Contents
Fetching ...

Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation

Guoyu Yang, Yuan Wang, Daming Shi

TL;DR

The paper tackles real-time semantic segmentation by proposing a Reparameterizable Dual-Resolution Network (RDRNet) that trains with multi-path blocks to boost accuracy and reparameterizes them into single-path blocks for fast inference. It introduces the Reparameterizable Block (RB) and the Reparameterizable Pyramid Pooling Module (RPPM) to enhance feature learning without increasing inference cost, enabling efficient bilateral fusion between semantic and detail branches. Through experiments on Cityscapes, CamVid, and Pascal VOC 2012, RDRNet achieves superior or competitive mIoU while maintaining high FPS, outperforming several state-of-the-art real-time models. The work demonstrates that careful architectural reparameterization and parallel pooling can deliver accurate, real-time semantic segmentation and suggests avenues for even more powerful reparameterizable designs.

Abstract

Semantic segmentation plays a key role in applications such as autonomous driving and medical image. Although existing real-time semantic segmentation models achieve a commendable balance between accuracy and speed, their multi-path blocks still affect overall speed. To address this issue, this study proposes a Reparameterizable Dual-Resolution Network (RDRNet) dedicated to real-time semantic segmentation. Specifically, RDRNet employs a two-branch architecture, utilizing multi-path blocks during training and reparameterizing them into single-path blocks during inference, thereby enhancing both accuracy and inference speed simultaneously. Furthermore, we propose the Reparameterizable Pyramid Pooling Module (RPPM) to enhance the feature representation of the pyramid pooling module without increasing its inference time. Experimental results on the Cityscapes, CamVid, and Pascal VOC 2012 datasets demonstrate that RDRNet outperforms existing state-of-the-art models in terms of both performance and speed. The code is available at https://github.com/gyyang23/RDRNet.

Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation

TL;DR

The paper tackles real-time semantic segmentation by proposing a Reparameterizable Dual-Resolution Network (RDRNet) that trains with multi-path blocks to boost accuracy and reparameterizes them into single-path blocks for fast inference. It introduces the Reparameterizable Block (RB) and the Reparameterizable Pyramid Pooling Module (RPPM) to enhance feature learning without increasing inference cost, enabling efficient bilateral fusion between semantic and detail branches. Through experiments on Cityscapes, CamVid, and Pascal VOC 2012, RDRNet achieves superior or competitive mIoU while maintaining high FPS, outperforming several state-of-the-art real-time models. The work demonstrates that careful architectural reparameterization and parallel pooling can deliver accurate, real-time semantic segmentation and suggests avenues for even more powerful reparameterizable designs.

Abstract

Semantic segmentation plays a key role in applications such as autonomous driving and medical image. Although existing real-time semantic segmentation models achieve a commendable balance between accuracy and speed, their multi-path blocks still affect overall speed. To address this issue, this study proposes a Reparameterizable Dual-Resolution Network (RDRNet) dedicated to real-time semantic segmentation. Specifically, RDRNet employs a two-branch architecture, utilizing multi-path blocks during training and reparameterizing them into single-path blocks during inference, thereby enhancing both accuracy and inference speed simultaneously. Furthermore, we propose the Reparameterizable Pyramid Pooling Module (RPPM) to enhance the feature representation of the pyramid pooling module without increasing its inference time. Experimental results on the Cityscapes, CamVid, and Pascal VOC 2012 datasets demonstrate that RDRNet outperforms existing state-of-the-art models in terms of both performance and speed. The code is available at https://github.com/gyyang23/RDRNet.
Paper Structure (17 sections, 6 equations, 8 figures, 6 tables)

This paper contains 17 sections, 6 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The trade-off between inference speed and accuracy for real-time semantic segmentation models on the Cityscapes test set. Orange rectangles refer to our models, while green diamonds represent others.
  • Figure 2: Overall architecture of RDRNet. Following the feature shunting, the upper branch belongs to the semantic branch, while the lower branch belongs to the detail branch. RPPM refers to the proposed pyramid pooling module.
  • Figure 3: The details of bilateral fusion in RDRNet. The bilateral fusion in the diagram corresponds to that after stage 4, with a similar fusion following stage 5. The key difference lies in the varying input/output channel numbers and upsampling rates.
  • Figure 4: Training and inference structure of the reparameterizable block. During non-downsampling, the training structure comprises three paths, while during downsampling, it reduces to two paths. In contrast to the training structure, the inference structure consists of only one path, where the convolutions are obtained through reparameterization from convolutions on other paths.
  • Figure 5: The reparameterization process of RB from two different perspectives, namely, transitioning from a multi-path structure to a single-path structure.
  • ...and 3 more figures