Progressive Multi-task Anti-Noise Learning and Distilling Frameworks for Fine-grained Vehicle Recognition
Dichao Liu
TL;DR
This work tackles fine-grained vehicle recognition under image noise by introducing two frameworks: Progressive Multi-task Anti-noise Learning (PMAL), which adds a denoising auxiliary task via a Denoising-recognition Head, and Progressive Multi-task Distilling (PMD), which transfers PMAL-derived robustness to a standard backbone. PMAL trains multiple DRHs across shallow-to-deep layers to learn noise-invariant features, while PMD uses a teacher-student distillation paradigm with progressive guidance from intermediate features to final predictions, aided by Sharpness-Aware Minimization (SAM). Empirical results on Stanford Cars, CompCars, BIT-Vehicle, VTID2, and VIDMMR show sizable accuracy gains over state-of-the-art FGVR methods without extra inference cost, including 100% on VIDMMR in some PMD configurations. The approach offers practical impact for robust FGVR in noisy real-world ITS and surveillance scenarios by delivering high accuracy with backbone-accurate architectures and noise-resilient representations.
Abstract
Fine-grained vehicle recognition (FGVR) is an essential fundamental technology for intelligent transportation systems, but very difficult because of its inherent intra-class variation. Most previous FGVR studies only focus on the intra-class variation caused by different shooting angles, positions, etc., while the intra-class variation caused by image noise has received little attention. This paper proposes a progressive multi-task anti-noise learning (PMAL) framework and a progressive multi-task distilling (PMD) framework to solve the intra-class variation problem in FGVR due to image noise. The PMAL framework achieves high recognition accuracy by treating image denoising as an additional task in image recognition and progressively forcing a model to learn noise invariance. The PMD framework transfers the knowledge of the PMAL-trained model into the original backbone network, which produces a model with about the same recognition accuracy as the PMAL-trained model, but without any additional overheads over the original backbone network. Combining the two frameworks, we obtain models that significantly exceed previous state-of-the-art methods in recognition accuracy on two widely-used, standard FGVR datasets, namely Stanford Cars, and CompCars, as well as three additional surveillance image-based vehicle-type classification datasets, namely Beijing Institute of Technology (BIT)-Vehicle, Vehicle Type Image Data 2 (VTID2), and Vehicle Images Dataset for Make Model Recognition (VIDMMR), without any additional overheads over the original backbone networks. The source code is available at https://github.com/Dichao-Liu/Anti-noise_FGVR
