Residual Feature-Reutilization Inception Network for Image Classification
Yuanpeng He, Wenjie Song, Lijian Li, Tianxiang Zhan, Wenpin Jiao
TL;DR
This paper addresses the need for efficient multi-scale feature extraction in image classification by introducing Residual Feature Reutilization Inception (ResFRI) and Split-ResFRI blocks. These blocks combine four parallel convolutional paths with information interaction passages and a residual connection to enlarge the receptive field while keeping parameter counts low; Split-ResFRI further reduces computation by partitioning the input channels into four groups with ratios $3/8$, $3/8$, $1/8$, and $1/8$. Across CIFAR-10, CIFAR-100, and Tiny Imagenet, the proposed methods achieve state-of-the-art or competitive accuracy for approximately equal model sizes and without extra data, often surpassing ResNet-101 and several inception-based baselines. The work offers a lightweight CNN backbone with strong multi-scale representation, suitable for deployment on resource-constrained devices and adaptable to tasks such as segmentation and object detection, with future directions focusing on more efficient fusion strategies and optimal channel splits.
Abstract
Capturing feature information effectively is of great importance in the field of computer vision. With the development of convolutional neural networks (CNNs), concepts like residual connection and multiple scales promote continual performance gains in diverse deep learning vision tasks. In this paper, we propose a novel CNN architecture that it consists of residual feature-reutilization inceptions (ResFRI) or split-residual feature-reutilization inceptions (Split-ResFRI). And it is composed of four convolutional combinations of different structures connected by specially designed information interaction passages, which are utilized to extract multi-scale feature information and effectively increase the receptive field of the model. Moreover, according to the network structure designed above, Split-ResFRI can adjust the segmentation ratio of the input information, thereby reducing the number of parameters and guaranteeing the model performance. Specifically, in experiments based on popular vision datasets, such as CIFAR10 ($97.94$\%), CIFAR100 ($85.91$\%) and Tiny Imagenet ($70.54$\%), we obtain state-of-the-art results compared with other modern models under the premise that the model size is approximate and no additional data is used.
