Table of Contents
Fetching ...

Golden Cudgel Network for Real-Time Semantic Segmentation

Guoyu Yang, Yuan Wang, Daming Shi, Yanzhong Wang

TL;DR

GCNet tackles the real-time semantic segmentation bottleneck by using training-time vertical multi-convolutions and horizontal multi-paths that are reparameterized into a single $3 \times 3$ convolution for inference. It introduces the Golden Cudgel Block (GCBlock), which enables self-enlargement during training and self-contraction during inference, effectively acting as its own teacher without external models. Empirical results on Cityscapes, CamVid, and Pascal VOC 2012 show GCNet achieving a favorable balance of high mIoU and fast FPS, with GCNet-S reaching 77.3% mIoU at 193.3 FPS and GCNet-L achieving 79.6% mIoU, while maintaining strong zero-shot and training efficiency relative to comparable real-time methods. The work demonstrates that reparameterizable, multi-path training can realize the benefits of complex blocks for learning without sacrificing inference speed, offering practical impact for deployment in latency-critical applications.

Abstract

Recent real-time semantic segmentation models, whether single-branch or multi-branch, achieve good performance and speed. However, their speed is limited by multi-path blocks, and some depend on high-performance teacher models for training. To overcome these issues, we propose Golden Cudgel Network (GCNet). Specifically, GCNet uses vertical multi-convolutions and horizontal multi-paths for training, which are reparameterized into a single convolution for inference, optimizing both performance and speed. This design allows GCNet to self-enlarge during training and self-contract during inference, effectively becoming a "teacher model" without needing external ones. Experimental results show that GCNet outperforms existing state-of-the-art models in terms of performance and speed on the Cityscapes, CamVid, and Pascal VOC 2012 datasets. The code is available at https://github.com/gyyang23/GCNet.

Golden Cudgel Network for Real-Time Semantic Segmentation

TL;DR

GCNet tackles the real-time semantic segmentation bottleneck by using training-time vertical multi-convolutions and horizontal multi-paths that are reparameterized into a single convolution for inference. It introduces the Golden Cudgel Block (GCBlock), which enables self-enlargement during training and self-contraction during inference, effectively acting as its own teacher without external models. Empirical results on Cityscapes, CamVid, and Pascal VOC 2012 show GCNet achieving a favorable balance of high mIoU and fast FPS, with GCNet-S reaching 77.3% mIoU at 193.3 FPS and GCNet-L achieving 79.6% mIoU, while maintaining strong zero-shot and training efficiency relative to comparable real-time methods. The work demonstrates that reparameterizable, multi-path training can realize the benefits of complex blocks for learning without sacrificing inference speed, offering practical impact for deployment in latency-critical applications.

Abstract

Recent real-time semantic segmentation models, whether single-branch or multi-branch, achieve good performance and speed. However, their speed is limited by multi-path blocks, and some depend on high-performance teacher models for training. To overcome these issues, we propose Golden Cudgel Network (GCNet). Specifically, GCNet uses vertical multi-convolutions and horizontal multi-paths for training, which are reparameterized into a single convolution for inference, optimizing both performance and speed. This design allows GCNet to self-enlarge during training and self-contract during inference, effectively becoming a "teacher model" without needing external ones. Experimental results show that GCNet outperforms existing state-of-the-art models in terms of performance and speed on the Cityscapes, CamVid, and Pascal VOC 2012 datasets. The code is available at https://github.com/gyyang23/GCNet.

Paper Structure

This paper contains 22 sections, 6 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The trade-off between inference speed and performance for real-time semantic segmentation models on the Cityscapes validation set.
  • Figure 2: A comparison of the proposed GCBlock with multi-path blocks: (a) Residual Block he2016deep, used by model pan2022deepxu2023pidnetxu2024sctnet. (b) Conv-Former Block xu2024sctnet, used by model xu2024sctnet. (c) GCBlock, a block that is scalable in both vertical and horizontal directions.
  • Figure 3: The overall architecture of GCNet. After feature flow into two branches, the upper branch corresponds to the semantic branch, while the lower branch corresponds to the detail branch. The orange box indicates that the first block within the GCBlocks has a stride of 2, while the remaining blocks have a stride of 1. The green box signifies that all GCBlocks maintain a stride of 1. PPM refers to the Deep Aggregation Pyramid Pooling Module pan2022deep.
  • Figure 4: Ablation study on Path$\mathbf{_{3\times3\_1\times1}}$ for GCNet-S. "N" indicates the number of Path$\mathbf{_{3\times3\_1\times1}}$, while "Iter-20000" signifies that 20000 iterations were completed, and so on.
  • Figure 5: Visualization of different segmentation models on the Cityscapes validation set.
  • ...and 3 more figures