Optimally Deep Networks -- Adapting Model Depth to Datasets for Superior Efficiency
Shaharyar Ahmed Khan Tareen, Filza Khan Tareen
TL;DR
The paper addresses the inefficiency of always training deep networks on tasks with varying complexity. It proposes Optimally Deep Networks (ODNs), a depth-adaptive approach that uses progressive depth expansion to start shallow, incrementally deepen, and extract an optimally deep subnet once a target accuracy is reached, thereby reducing memory, FLOPs, and inference costs. The method relies on depth partitioning with per-depth outputs, a warm-up phase, and a final fine-tuning step, avoiding the heavy search spaces of traditional NAS. Empirical results across five datasets and multiple ResNet architectures show memory reductions up to 98% with competitive accuracy, enabling efficient edge deployment and scalable use across devices; code is released to support reproducibility.
Abstract
Deep neural networks (DNNs) have provided brilliant performance across various tasks. However, this success often comes at the cost of unnecessarily large model sizes, high computational demands, and substantial memory footprints. Typically, powerful architectures are trained at full depths but not all datasets or tasks require such high model capacity. Training big and deep architectures on relatively low-complexity datasets frequently leads to wasted computation, unnecessary energy consumption, and excessive memory usage, which in turn makes deployment of models on resource-constrained devices impractical. To address this problem, we introduce the concept of Optimally Deep Networks (ODNs), which provides a balance between model depth and task complexity. Specifically, we propose a NAS like training strategy called progressive depth expansion, which begins by training neural networks at shallower depths and incrementally increases their depth as the earlier blocks converge, continuing this process until the target accuracy is reached. ODNs use only the optimal depth for the tasks at hand, removing redundant layers. This cuts down future training and inference costs, lowers the model memory footprint, enhances computational efficiency, and facilitates deployment on edge devices. Empirical results show that the optimal depths of ResNet-18 and ResNet-34 for MNIST and SVHN, achieve up to 98.64 % and 96.44 % reduction in memory footprint, while maintaining a competitive accuracy of 99.31 % and 96.08 %, respectively.
