Table of Contents
Fetching ...

Operating critical machine learning models in resource constrained regimes

Raghavendra Selvan, Julian Schön, Erik B Dam

TL;DR

This paper addresses the challenge of running deep learning models in resource-constrained clinical environments where data, compute, and energy requirements are barriers to deployment. It evaluates a subset of resource-efficiency strategies—automatic mixed precision (AMP), 8-bit optimiser, and half-precision weights, along with gradient/activation quantisation—on RSNA Mammography and LIDC-IDRI datasets across CNNs and transformer architectures. Key findings show that AMP can reduce memory and training time without hurting performance for CNNs, the 8-bit optimiser often improves convergence and lowers resource use, while transformer models are more sensitive to low-precision settings; best configurations include DenseNet with 8-bit optimiser and Swin Transformer with 8-bit optimiser plus half precision. The results suggest that resource-efficient techniques should be integrated into standard clinical-deployment pipelines to enable faster, greener, and more accessible medical imaging tools, though limitations such as lack of NAS exploration and real-edge deployment validation remain.

Abstract

The accelerated development of machine learning methods, primarily deep learning, are causal to the recent breakthroughs in medical image analysis and computer aided intervention. The resource consumption of deep learning models in terms of amount of training data, compute and energy costs are known to be massive. These large resource costs can be barriers in deploying these models in clinics, globally. To address this, there are cogent efforts within the machine learning community to introduce notions of resource efficiency. For instance, using quantisation to alleviate memory consumption. While most of these methods are shown to reduce the resource utilisation, they could come at a cost in performance. In this work, we probe into the trade-off between resource consumption and performance, specifically, when dealing with models that are used in critical settings such as in clinics.

Operating critical machine learning models in resource constrained regimes

TL;DR

This paper addresses the challenge of running deep learning models in resource-constrained clinical environments where data, compute, and energy requirements are barriers to deployment. It evaluates a subset of resource-efficiency strategies—automatic mixed precision (AMP), 8-bit optimiser, and half-precision weights, along with gradient/activation quantisation—on RSNA Mammography and LIDC-IDRI datasets across CNNs and transformer architectures. Key findings show that AMP can reduce memory and training time without hurting performance for CNNs, the 8-bit optimiser often improves convergence and lowers resource use, while transformer models are more sensitive to low-precision settings; best configurations include DenseNet with 8-bit optimiser and Swin Transformer with 8-bit optimiser plus half precision. The results suggest that resource-efficient techniques should be integrated into standard clinical-deployment pipelines to enable faster, greener, and more accessible medical imaging tools, though limitations such as lack of NAS exploration and real-edge deployment validation remain.

Abstract

The accelerated development of machine learning methods, primarily deep learning, are causal to the recent breakthroughs in medical image analysis and computer aided intervention. The resource consumption of deep learning models in terms of amount of training data, compute and energy costs are known to be massive. These large resource costs can be barriers in deploying these models in clinics, globally. To address this, there are cogent efforts within the machine learning community to introduce notions of resource efficiency. For instance, using quantisation to alleviate memory consumption. While most of these methods are shown to reduce the resource utilisation, they could come at a cost in performance. In this work, we probe into the trade-off between resource consumption and performance, specifically, when dealing with models that are used in critical settings such as in clinics.
Paper Structure (6 sections, 5 figures, 2 tables)

This paper contains 6 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: (A): Sample mammography images from the RSNA training set showing the diverse acquisition and anatomical variations. (B): Age distribution of the subjects in the dataset used. Maximum age of $55$ yr was used.
  • Figure 2: (A & B): Radar plots showing the mean metrics for performance ($P_T, E, \text{GPU},T$) reported in Table \ref{['tab:results']} for the five settings, shown for Densenet huang2017densely and Vision transformer dosovitskiyimage. (C): Influence of different scales of downsampling on the test performance for three configurations of Densenet models on the RSNA dataset. For each, the mean and the standard deviation of the three runs is shown as the curve and the shaded area, respectively.
  • Figure 3: LIDC-IDRI dataset comprises 1018 thoracic CT images with lesions annotated by four radiologists armato2004lung. Patches of 128x128 px are extracted from the 2D slices, yielding a total of 15096 patches lidc. Each patch has annotations from four raters marking the tumour regions. These segmentation masks were converted into binary labels indicating the presence (if $\geq 2$ radiologists marked a tumour) or absence of tumours (if $<2$ raters marked tumours), resulting in a fairly balanced dataset. All image intensities are normalised to be in [0, 1]. Training, validation and test splits are made following a [0.6,0.2,0.2] ratio.
  • Figure 4: (A & B): Radar plots for multi-layered perceptron (MLP) and EfficientNet sze2017efficient showing the mean metrics for performance ($P_T, E, \text{GPU},T$) reported in Table \ref{['tab:app_results']} for the five settings.
  • Figure 5: Radar plots for DenseNet huang2017densely, Swin Transformer liu2021swin and Vision Transformer dosovitskiyimage on the LIDC dataset, showing the mean metrics for performance ($P_T, E, \text{GPU},T$) reported in Table \ref{['tab:results']} for the five settings.