CNN Mixture-of-Depths
Rinor Cakaj, Jens Mehnert, Bin Yang
TL;DR
CNN Mixture-of-Depths (MoD) tackles the computational bottleneck of CNNs by dynamically selecting the most informative channels within Conv-Blocks while preserving a fixed tensor shape through a fusion mechanism, enabling a static computation graph with dynamic resource allocation. The method combines a Channel Selector, reduced-channel Conv-Blocks, and a fusion step to maintain dimensionality, resulting in substantial speedups with little to no loss in accuracy across ImageNet, Cityscapes, and Pascal VOC, and with improvements observable in CIFAR as well. Key contributions include demonstrating that a fixed-graph MoD approach can realize practical speedups without custom CUDA kernels or specialized losses, and showing that channel-wise selective processing yields both efficiency and regularization benefits. The results indicate strong potential for deploying efficient CNNs on resource-constrained devices while maintaining competitive performance in vision tasks, with future work focusing on kernel-level optimization of the fusion path and optimal channel counts per block.
Abstract
We introduce Mixture-of-Depths (MoD) for Convolutional Neural Networks (CNNs), a novel approach that enhances the computational efficiency of CNNs by selectively processing channels based on their relevance to the current prediction. This method optimizes computational resources by dynamically selecting key channels in feature maps for focused processing within the convolutional blocks (Conv-Blocks), while skipping less relevant channels. Unlike conditional computation methods that require dynamic computation graphs, CNN MoD uses a static computation graph with fixed tensor sizes which improve hardware efficiency. It speeds up the training and inference processes without the need for customized CUDA kernels, unique loss functions, or finetuning. CNN MoD either matches the performance of traditional CNNs with reduced inference times, GMACs, and parameters, or exceeds their performance while maintaining similar inference times, GMACs, and parameters. For example, on ImageNet, ResNet86-MoD exceeds the performance of the standard ResNet50 by 0.45% with a 6% speedup on CPU and 5% on GPU. Moreover, ResNet75-MoD achieves the same performance as ResNet50 with a 25% speedup on CPU and 15% on GPU.
