Infrared Object Detection with Ultra Small ConvNets: Is ImageNet Pretraining Still Useful?
Srikanth Muralidharan, Heitor R. Medeiros, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli
TL;DR
This work investigates whether ImageNet pretraining remains beneficial for ultra-small ConvNets aimed at infrared object detection on embedded devices. By downscaling EfficientNet-B0 and MobileNetV3 to ultra-small variants (B-1..B-7 and S-0..S-6) and evaluating with IN, IN→COCO, and scratch initializations, the authors quantify capacity-driven effects on cross-domain and cross-modality generalization. They show that pretraining benefits persist for moderate capacities but diminish as model size shrinks, with IN→COCO often outperforming other initializations in detection tasks, especially for easier shifts and larger backbones; for the smallest models, gains can disappear or reverse. The results yield practical guidance: use pretraining when possible, but avoid the ultra-small regime for deployment under domain shifts, and consider task-aligned pretraining (IN→COCO) for better out-of-domain robustness. The study provides a scalable scaling recipe and a comprehensive benchmark across detection and classification to inform embedded-system design and deployment choices.
Abstract
Many real-world applications require recognition models that are robust to different operational conditions and modalities, but at the same time run on small embedded devices, with limited hardware. While for normal size models, pre-training is known to be very beneficial in accuracy and robustness, for small models, that can be employed for embedded and edge devices, its effect is not clear. In this work, we investigate the effect of ImageNet pretraining on increasingly small backbone architectures (ultra-small models, with less than 1M parameters) with respect to robustness in downstream object detection tasks in the infrared visual modality. Using scaling laws derived from standard object recognition architectures, we construct two ultra-small backbone families and systematically study their performance. Our experiments on three different datasets reveal that while ImageNet pre-training is still useful, beyond a certain capacity threshold, it offers diminishing returns in terms of out-of-distribution detection robustness. Therefore, we advise practitioners to still use pre-training and, when possible avoid too small models as while they might work well for in-domain problems, they are brittle when working conditions are different.
