Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation
Suraj Mishra, Danny Z. Chen
TL;DR
This paper tackles the challenge of refining dense predictions in medical image segmentation by introducing densely decoded networks (DDN) that employ crutch connections to fuse multi-scale encoder and decoder information, enhancing localization and fine-detail generation. It further proposes Adaptive Deep Supervision (ADS), which selects the auxiliary supervision layer by matching the dataset's average object size with the network's layer effective receptive field, and injects a companion objective to encourage robust, input-dependent feature learning. The authors formalize ICC and OCC mechanisms, and derive an adaptive strategy that leverages $Obj$ and $LERF$ to place auxiliary losses, yielding a training objective $Loss_{total}=Loss_{main}+Loss_{aux}+L_2$. Empirical results across four diverse datasets (ISIC melanoma, BraTS, ultrasound lymph nodes, and wing disc images) demonstrate that DDN alone improves segmentation accuracy, while DDN+ADS provides further gains and, in some cases, state-of-the-art performance with reduced parameter counts. The work highlights the practical impact of multi-scale feature fusion and dataset-driven supervision in improving medical image segmentation across modalities.
Abstract
Medical image segmentation using deep neural networks has been highly successful. However, the effectiveness of these networks is often limited by inadequate dense prediction and inability to extract robust features. To achieve refined dense prediction, we propose densely decoded networks (ddn), by selectively introducing 'crutch' network connections. Such 'crutch' connections in each upsampling stage of the network decoder (1) enhance target localization by incorporating high resolution features from the encoder, and (2) improve segmentation by facilitating multi-stage contextual information flow. Further, we present a training strategy based on adaptive deep supervision (ads), which exploits and adapts specific attributes of input dataset, for robust feature extraction. In particular, ads strategically locates and deploys auxiliary supervision, by matching the average input object size with the layer-wise effective receptive fields (lerf) of a network, resulting in a class of ddns. Such inclusion of 'companion objective' from a specific hidden layer, helps the model pay close attention to some distinct input-dependent features, which the network might otherwise 'ignore' during training. Our new networks and training strategy are validated on 4 diverse datasets of different modalities, demonstrating their effectiveness.
