nnMobileNet: Rethinking CNN for Retinopathy Research

Wenhui Zhu; Peijie Qiu; Xiwen Chen; Xin Li; Natasha Lepore; Oana M. Dumitrascu; Yalin Wang

nnMobileNet: Rethinking CNN for Retinopathy Research

Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

TL;DR

The paper investigates whether CNNs can outperform vision transformers in retinal disease tasks by reengineering MobileNetV2 into nnMobileNet with targeted refinements. It introduces an inverted linear residual bottleneck (ILRB), heavy data augmentation, spatial dropout, AdamP optimization, and ReLU6 activation to enhance RD feature localization, achieving superior performance on multiple RD benchmarks without external pretraining. Across DR grading, multi-disease detection, and DME classification, nnMobileNet demonstrates state-of-the-art or competitive results with fewer parameters and faster inference compared to many ViT-based models. Visual interpretability via Grad-CAM shows improved lesion localization, supporting the argument that CNNs remain highly effective for RD with proper tuning. The study suggests a balanced view of CNNs and ViTs, advocating CNN-centered RD models while recognizing ViTs’ strengths and proposing hybrid approaches and large-kernel convolutions for future work.

Abstract

Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability to improve as more parameters are added. As a result, ViT-based models tend to outshine traditional CNNs in RD applications, albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images, working with patches rather than local regions, which can complicate the precise localization of small, variably presented lesions in RD. In our study, we revisited and updated the architecture of a CNN model, specifically MobileNet, to enhance its utility in RD diagnostics. We found that an optimized MobileNet, through selective modifications, can surpass ViT-based models in various RD benchmarks, including diabetic retinopathy grading, detection of multiple fundus diseases, and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET

nnMobileNet: Rethinking CNN for Retinopathy Research

TL;DR

Abstract

Paper Structure (16 sections, 6 figures, 5 tables)

This paper contains 16 sections, 6 figures, 5 tables.

Introduction
Related Works
Diabetic Retinopathy Assessment
Multi Retinopathy abnormal detection
Myopic maculopathy grading
Roadmap of a nnMobileNet
Channel Configuration of ILRB
Data Augmentation
Dropout
Optimizer
Activation Function
Experiments and Results
Datasets and Evaluation Metrics
Comparison to State-of-the-art Methods
Visual Interpretability
...and 1 more sections

Figures (6)

Figure 1: Model size vs average performance (F1, Accuracy and AUC) on retinal multi-disease abnormal detection using RFMid dataset. Our method demonstrates superiority over other CNN/ViT based methods in terms of performance and efficiency.
Figure 2: The roadmap of modifying a MobileNetV2 to the proposed no-new MobileNet (nnMobileNet) on the Messidor-2 dataset;
Figure 3: The detailed architecture of the no-new MobileNet (Including the Channel configuration) and the inverted linear residual bottleneck used in the no-new MobileNet.
Figure 4: Examples of data augmentation (Method III) and details of three sets of data augmentation we used.
Figure 5: Empirical studies on Messidor-2 dataset where subpanel pictures (a), (b), (c), and (d) represent different experimental groups, each of which is independent of the others. D and SD-[x] in subpanel (b) denote Dropout and SpatialDropout in position [x] as shown in Fig.\ref{['fig:network']}(c), respectively.
...and 1 more figures

nnMobileNet: Rethinking CNN for Retinopathy Research

TL;DR

Abstract

nnMobileNet: Rethinking CNN for Retinopathy Research

Authors

TL;DR

Abstract

Table of Contents

Figures (6)