Table of Contents
Fetching ...

RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization

Chengpeng Chen, Zichao Guo, Haien Zeng, Pengfei Xiong, Jian Dong

TL;DR

This paper addresses the hardware inefficiency of feature reuse in lightweight CNNs that rely on concatenation. It introduces RepGhost, a module that realizes feature reuse implicitly through structural re-parameterization, converting multi-branch training-time diversity into a simplified inference-time weight-space fusion. Built atop RepGhost, RepGhostNet demonstrates superior accuracy-latency trade-offs on ImageNet and COCO across mobile devices, outperforming GhostNet, MobileNetV3, and related architectures while reducing runtime costs. The approach challenges the convention that concatenation is cost-free, offering a practical path toward more hardware-efficient vision models with broad mobile applicability.

Abstract

Feature reuse has been a key technique in light-weight convolutional neural networks (CNNs) architecture design. Current methods usually utilize a concatenation operator to keep large channel numbers cheaply (thus large network capacity) by reusing feature maps from other layers. Although concatenation is parameters- and FLOPs-free, its computational cost on hardware devices is non-negligible. To address this, this paper provides a new perspective to realize feature reuse implicitly and more efficiently instead of concatenation. A novel hardware-efficient RepGhost module is proposed for implicit feature reuse via reparameterization, instead of using concatenation operator. Based on the RepGhost module, we develop our efficient RepGhost bottleneck and RepGhostNet. Experiments on ImageNet and COCO benchmarks demonstrate that our RepGhostNet is much more effective and efficient than GhostNet and MobileNetV3 on mobile devices. Specially, our RepGhostNet surpasses GhostNet 0.5x by 2.5% Top-1 accuracy on ImageNet dataset with less parameters and comparable latency on an ARM-based mobile device. Code and model weights are available at https://github.com/ChengpengChen/RepGhost.

RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization

TL;DR

This paper addresses the hardware inefficiency of feature reuse in lightweight CNNs that rely on concatenation. It introduces RepGhost, a module that realizes feature reuse implicitly through structural re-parameterization, converting multi-branch training-time diversity into a simplified inference-time weight-space fusion. Built atop RepGhost, RepGhostNet demonstrates superior accuracy-latency trade-offs on ImageNet and COCO across mobile devices, outperforming GhostNet, MobileNetV3, and related architectures while reducing runtime costs. The approach challenges the convention that concatenation is cost-free, offering a practical path toward more hardware-efficient vision models with broad mobile applicability.

Abstract

Feature reuse has been a key technique in light-weight convolutional neural networks (CNNs) architecture design. Current methods usually utilize a concatenation operator to keep large channel numbers cheaply (thus large network capacity) by reusing feature maps from other layers. Although concatenation is parameters- and FLOPs-free, its computational cost on hardware devices is non-negligible. To address this, this paper provides a new perspective to realize feature reuse implicitly and more efficiently instead of concatenation. A novel hardware-efficient RepGhost module is proposed for implicit feature reuse via reparameterization, instead of using concatenation operator. Based on the RepGhost module, we develop our efficient RepGhost bottleneck and RepGhostNet. Experiments on ImageNet and COCO benchmarks demonstrate that our RepGhostNet is much more effective and efficient than GhostNet and MobileNetV3 on mobile devices. Specially, our RepGhostNet surpasses GhostNet 0.5x by 2.5% Top-1 accuracy on ImageNet dataset with less parameters and comparable latency on an ARM-based mobile device. Code and model weights are available at https://github.com/ChengpengChen/RepGhost.
Paper Structure (25 sections, 2 equations, 12 figures, 12 tables)

This paper contains 25 sections, 2 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: Top-1 accuracy on ImageNet dataset vs. latency on an ARM-based mobile device, refer to Section \ref{['sec:exp']} for the detail and Appendix \ref{['sec:more-latency-eval']} for more running devices.
  • Figure 1: Runtime of concatenation and add operators with different batch sizes.
  • Figure 2: Runtime percentage of each operator in the entire network. Diff: the percent difference between concatenation and add. Ours: our method takes the add operator as an intermediate state, and it can be fused for fast inference.
  • Figure 3: Evolution from Ghost module to RepGhost module. We omit the input 1$\times$1 convolution for simplicity, refer to Figure \ref{['fig:repghost-bottleneck']} for more structure details. dconv: depthwise convolutional layer. cat: concatenation layer. a) Ghost module han2020ghostnet with ReLU; b) replacing concatenation with add; c) moving ReLU backward to make the module satisfying the rule of structural re-parameterization; d) RepGhost module during training; e) RepGhost module during inference.
  • Figure 3: Effects of re-parameterization on two light-weight CNNs.
  • ...and 7 more figures