Multi-Scale Dense Networks for Resource Efficient Image Classification
Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, Kilian Q. Weinberger
TL;DR
The paper tackles inference-time resource constraints by introducing Multi-Scale DenseNet (MSDNet), a CNN with dense inter-layer connections and a two-dimensional, multi-scale feature hierarchy that supports multiple early exits. By maintaining coarse and fine features throughout and densely connecting layers, MSDNet enables anytime prediction and budgeted batch classification with shared computation and minimal interference between exits. Empirical results on CIFAR-10/100 and ImageNet show MSDNet outperforms strong baselines across a spectrum of computational budgets, often by large margins, and a DenseNet variant highlights efficiency gains. The work demonstrates a practical path to accurate, resource-aware image classification suitable for diverse devices and large-scale systems.
Abstract
In this paper we investigate image classification with computational resource limits at test time. Two such settings are: 1. anytime classification, where the network's prediction for a test example is progressively updated, facilitating the output of a prediction at any time; and 2. budgeted batch classification, where a fixed amount of computation is available to classify a set of examples that can be spent unevenly across "easier" and "harder" inputs. In contrast to most prior work, such as the popular Viola and Jones algorithm, our approach is based on convolutional neural networks. We train multiple classifiers with varying resource demands, which we adaptively apply during test time. To maximally re-use computation between the classifiers, we incorporate them as early-exits into a single deep convolutional neural network and inter-connect them with dense connectivity. To facilitate high quality classification early on, we use a two-dimensional multi-scale network architecture that maintains coarse and fine level features all-throughout the network. Experiments on three image-classification tasks demonstrate that our framework substantially improves the existing state-of-the-art in both settings.
