RepVGG: Making VGG-style ConvNets Great Again
Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun
TL;DR
RepVGG tackles the speed-accuracy dilemma by decoupling training-time optimization from inference-time structure using structural re-parameterization. A training-time multi-branch block (including identity and 1×1 branches) becomes a single 3×3 conv stack for deployment, yielding a simple, fast, and memory-efficient plain architecture. On ImageNet, RepVGG surpasses ResNets and competes with state-of-the-art models, while also improving Cityscapes semantic segmentation backbones; ablations validate the necessity of the re-parameterization and BN placement. The work emphasizes hardware-friendly design and offers a practical path to high-performance plain ConvNets without heavy architecture search.
Abstract
We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. On ImageNet, RepVGG reaches over 80% top-1 accuracy, which is the first time for a plain model, to the best of our knowledge. On NVIDIA 1080Ti GPU, RepVGG models run 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy and show favorable accuracy-speed trade-off compared to the state-of-the-art models like EfficientNet and RegNet. The code and trained models are available at https://github.com/megvii-model/RepVGG.
