Analysis of Hyperparameter Optimization Effects on Lightweight Deep Models for Real-Time Image Classification
Vineet Kumar Rakesh, Soumya Mazumdar, Tapas Samanta, Hemendra Kumar Pandey, Amitabha Das
TL;DR
This work investigates how hyperparameter optimization affects accuracy and deployment feasibility for seven lightweight image classifiers on a class-balanced ImageNet-1K subset. By systematically varying learning rate, augmentation, optimizers, and batch sizing under standardized training, the study quantifies convergence behavior and real-time performance across CNN, hybrid, and transformer-based architectures. Key findings show that careful hyperparameter choices yield 1.5–3.5% absolute gains in Top-1 accuracy, with TinyViT-21M achieving the highest accuracy while models like MobileNetV3-L and RepVGG-A2 deliver exceptional edge deployment latency (sub-5 ms) and high FPS at large batch sizes. The results provide reproducible benchmarks and practical insights for balancing speed and accuracy in edge AI applications on constrained hardware.
Abstract
Lightweight convolutional and transformer-based networks are increasingly preferred for real-time image classification, especially on resource-constrained devices. This study evaluates the impact of hyperparameter optimization on the accuracy and deployment feasibility of seven modern lightweight architectures: ConvNeXt-T, EfficientNetV2-S, MobileNetV3-L, MobileViT v2 (S/XS), RepVGG-A2, and TinyViT-21M, trained on a class-balanced subset of 90,000 images from ImageNet-1K. Under standardized training settings, this paper investigates the influence of learning rate schedules, augmentation, optimizers, and initialization on model performance. Inference benchmarks are performed using an NVIDIA L40s GPU with batch sizes ranging from 1 to 512, capturing latency and throughput in real-time conditions. This work demonstrates that controlled hyperparameter variation significantly alters convergence dynamics in lightweight CNN and transformer backbones, providing insight into stability regions and deployment feasibility in edge artificial intelligence. Our results reveal that tuning alone leads to a top-1 accuracy improvement of 1.5 to 3.5 percent over baselines, and select models (e.g., RepVGG-A2, MobileNetV3-L) deliver latency under 5 milliseconds and over 9,800 frames per second, making them ideal for edge deployment. This work provides reproducible, subset-based insights into lightweight hyperparameter tuning and its role in balancing speed and accuracy. The code and logs may be seen at: https://vineetkumarrakesh.github.io/lcnn-opt
