GhostNetV3: Exploring the Training Strategies for Compact Models

Zhenhua Liu; Zhiwei Hao; Kai Han; Yehui Tang; Yunhe Wang

GhostNetV3: Exploring the Training Strategies for Compact Models

Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, Yunhe Wang

TL;DR

This work addresses the mismatch between training strategies and the limited capacity of compact vision networks. It systematically studies re-parameterization, knowledge distillation, learning schedules, and data augmentation to craft a specialized training recipe. When applied to GhostNetV3 and other efficient architectures, the recipe delivers substantial gains in top-1 accuracy with low FLOPs and mobile latency, and extends to object detection on COCO. The findings underscore the importance of capacity-aware training for edge-efficient vision models and provide actionable, architecture-agnostic guidance and code for practitioners.

Abstract

Compact neural networks are specially designed for applications on edge devices with faster inference speed yet modest performance. However, training strategies of compact models are borrowed from that of conventional models at present, which ignores their difference in model capacity and thus may impede the performance of compact models. In this paper, by systematically investigating the impact of different training ingredients, we introduce a strong training strategy for compact models. We find that the appropriate designs of re-parameterization and knowledge distillation are crucial for training high-performance compact models, while some commonly used data augmentations for training conventional models, such as Mixup and CutMix, lead to worse performance. Our experiments on ImageNet-1K dataset demonstrate that our specialized training strategy for compact models is applicable to various architectures, including GhostNetV2, MobileNetV2 and ShuffleNetV2. Specifically, equipped with our strategy, GhostNetV3 1.3$\times$ achieves a top-1 accuracy of 79.1% with only 269M FLOPs and a latency of 14.46ms on mobile devices, surpassing its ordinarily trained counterpart by a large margin. Moreover, our observation can also be extended to object detection scenarios. PyTorch code and checkpoints can be found at https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnetv3_pytorch.

GhostNetV3: Exploring the Training Strategies for Compact Models

TL;DR

Abstract

achieves a top-1 accuracy of 79.1% with only 269M FLOPs and a latency of 14.46ms on mobile devices, surpassing its ordinarily trained counterpart by a large margin. Moreover, our observation can also be extended to object detection scenarios. PyTorch code and checkpoints can be found at https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnetv3_pytorch.

Paper Structure (22 sections, 4 equations, 5 figures, 10 tables)

This paper contains 22 sections, 4 equations, 5 figures, 10 tables.

Introduction
Related works
Compact models
Bag of tricks for training CNNs
Preliminary
Training strategies
Re-parameterization
Knowledge distillation
Learning schedule
Data augmentation
Experimental results
Re-parameterization
Knowledge distillation
Learning schedule
Learning rate schedule.
...and 7 more sections

Figures (5)

Figure 1: The top-1 validation accuracy and the latency on CPU of various compact models on ImageNet dataset.
Figure 2: The architectures of GhostNetV2 and GhostNetV3.
Figure 3: The top-1 validation accuracy for various learning rates of GhostNetV3.
Figure 4: The top-1 accuracy achieved with various decay values of EMA.
Figure 5: The FLOPs and the latency of the compact models on mobile phone.

GhostNetV3: Exploring the Training Strategies for Compact Models

TL;DR

Abstract

GhostNetV3: Exploring the Training Strategies for Compact Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)