DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim, Byeongho Heo, Dongyoon Han
TL;DR
DenseNets have been historically outpaced by residual and Transformer-based architectures due to design and training limitations. This work revisits DenseNets, arguing that concatenation shortcuts can yield higher representational rank, and offers a modernized RDNet design with wider, shallower blocks, improved feature mixers, larger intermediate dimensions, and patch-based stems. Through a comprehensive pilot study of thousands of random networks and extensive ImageNet-1K/downstream task evaluations, RDNet demonstrates competitive or superior speed-accuracy trade-offs compared to state-of-the-art models, with strong performance on ADE20K and COCO as well. The findings suggest concatenation-based DenseNet designs can complement ResNet- and ViT-based paradigms, offering practical advantages in memory efficiency and robustness across resolutions. Code and models are provided to foster further exploration of DenseNet-style architectures in modern vision tasks.
Abstract
This paper revives Densely Connected Convolutional Networks (DenseNets) and reveals the underrated effectiveness over predominant ResNet-style architectures. We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities. Our pilot study shows dense connections through concatenation are strong, demonstrating that DenseNets can be revitalized to compete with modern architectures. We methodically refine suboptimal components - architectural adjustments, block redesign, and improved training recipes towards widening DenseNets and boosting memory efficiency while keeping concatenation shortcuts. Our models, employing simple architectural elements, ultimately surpass Swin Transformer, ConvNeXt, and DeiT-III - key architectures in the residual learning lineage. Furthermore, our models exhibit near state-of-the-art performance on ImageNet-1K, competing with the very recent models and downstream tasks, ADE20k semantic segmentation, and COCO object detection/instance segmentation. Finally, we provide empirical analyses that uncover the merits of the concatenation over additive shortcuts, steering a renewed preference towards DenseNet-style designs. Our code is available at https://github.com/naver-ai/rdnet.
