Table of Contents
Fetching ...

Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation

Suraj Mishra, Danny Z. Chen

TL;DR

This paper tackles the challenge of refining dense predictions in medical image segmentation by introducing densely decoded networks (DDN) that employ crutch connections to fuse multi-scale encoder and decoder information, enhancing localization and fine-detail generation. It further proposes Adaptive Deep Supervision (ADS), which selects the auxiliary supervision layer by matching the dataset's average object size with the network's layer effective receptive field, and injects a companion objective to encourage robust, input-dependent feature learning. The authors formalize ICC and OCC mechanisms, and derive an adaptive strategy that leverages $Obj$ and $LERF$ to place auxiliary losses, yielding a training objective $Loss_{total}=Loss_{main}+Loss_{aux}+L_2$. Empirical results across four diverse datasets (ISIC melanoma, BraTS, ultrasound lymph nodes, and wing disc images) demonstrate that DDN alone improves segmentation accuracy, while DDN+ADS provides further gains and, in some cases, state-of-the-art performance with reduced parameter counts. The work highlights the practical impact of multi-scale feature fusion and dataset-driven supervision in improving medical image segmentation across modalities.

Abstract

Medical image segmentation using deep neural networks has been highly successful. However, the effectiveness of these networks is often limited by inadequate dense prediction and inability to extract robust features. To achieve refined dense prediction, we propose densely decoded networks (ddn), by selectively introducing 'crutch' network connections. Such 'crutch' connections in each upsampling stage of the network decoder (1) enhance target localization by incorporating high resolution features from the encoder, and (2) improve segmentation by facilitating multi-stage contextual information flow. Further, we present a training strategy based on adaptive deep supervision (ads), which exploits and adapts specific attributes of input dataset, for robust feature extraction. In particular, ads strategically locates and deploys auxiliary supervision, by matching the average input object size with the layer-wise effective receptive fields (lerf) of a network, resulting in a class of ddns. Such inclusion of 'companion objective' from a specific hidden layer, helps the model pay close attention to some distinct input-dependent features, which the network might otherwise 'ignore' during training. Our new networks and training strategy are validated on 4 diverse datasets of different modalities, demonstrating their effectiveness.

Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation

TL;DR

This paper tackles the challenge of refining dense predictions in medical image segmentation by introducing densely decoded networks (DDN) that employ crutch connections to fuse multi-scale encoder and decoder information, enhancing localization and fine-detail generation. It further proposes Adaptive Deep Supervision (ADS), which selects the auxiliary supervision layer by matching the dataset's average object size with the network's layer effective receptive field, and injects a companion objective to encourage robust, input-dependent feature learning. The authors formalize ICC and OCC mechanisms, and derive an adaptive strategy that leverages and to place auxiliary losses, yielding a training objective . Empirical results across four diverse datasets (ISIC melanoma, BraTS, ultrasound lymph nodes, and wing disc images) demonstrate that DDN alone improves segmentation accuracy, while DDN+ADS provides further gains and, in some cases, state-of-the-art performance with reduced parameter counts. The work highlights the practical impact of multi-scale feature fusion and dataset-driven supervision in improving medical image segmentation across modalities.

Abstract

Medical image segmentation using deep neural networks has been highly successful. However, the effectiveness of these networks is often limited by inadequate dense prediction and inability to extract robust features. To achieve refined dense prediction, we propose densely decoded networks (ddn), by selectively introducing 'crutch' network connections. Such 'crutch' connections in each upsampling stage of the network decoder (1) enhance target localization by incorporating high resolution features from the encoder, and (2) improve segmentation by facilitating multi-stage contextual information flow. Further, we present a training strategy based on adaptive deep supervision (ads), which exploits and adapts specific attributes of input dataset, for robust feature extraction. In particular, ads strategically locates and deploys auxiliary supervision, by matching the average input object size with the layer-wise effective receptive fields (lerf) of a network, resulting in a class of ddns. Such inclusion of 'companion objective' from a specific hidden layer, helps the model pay close attention to some distinct input-dependent features, which the network might otherwise 'ignore' during training. Our new networks and training strategy are validated on 4 diverse datasets of different modalities, demonstrating their effectiveness.
Paper Structure (6 sections, 4 figures, 1 table)

This paper contains 6 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Our proposed DDN architecture. ICC is shown as dashed colored arrows while OCC is thick colored arrows. ADS for (a) melanoma, (b) wing disc (red dotted box for the Obj $>$ LERF$_N$ cases), (c) lymph node, (d) BraTS are shown as dotted colorless arrows. Each learnable layer (except 1x1 conv) is followed by batch-norm and ReLU activation. Upsampling layers in ICC (similar as in OCC) are not shown here.
  • Figure 2: Our proposed model for ADS. Based on the (LERF, Obj) matching (i.e., inputs, shown in green), ADS suggests the location of Loss$_{aux}$, shown in red and blue boxes.
  • Figure 3: Example segmentation results for (row-1) lymph node, (row-2) wing disc, (row-3) melanoma, and (row-4) BraTS dataset. (a), (c), and (e) show example test images. (b), (d), and (f) are the ground truth (blue) and segmentation output (red) obtained using DDN + ADS. Magenta (red + blue) highlights true positive regions.
  • Figure 4: (a) A wing-disc test image. (b) Ground truth (GT). (c) GT in red, FCN (output) in green, and FCN+ADS in blue. (d) GT in red, DDN-ICC in green, and DDN-ICC+ADS in blue. (e) GT in red, DDN-ICC in green, and DDN+ADS in blue. (f) GT in red, DDN-OCC+ADS in green, and DDN+ADS in blue. (g) GT in red, DDN+ADS$^*$ (ADS from (a) in Fig. \ref{['fig:ddn']}) in green, DDN+ADS in blue ('-': without; '+': with).