Early Stopping Criteria for Training Generative Adversarial Networks in Biomedical Imaging
Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O'Reilly
TL;DR
GAN training in biomedical imaging is computationally expensive and prone to mode collapse, non-convergence, and instability. The authors propose a quantitative early stopping framework that combines generator/discriminator loss ranges with MS-SSIM and FID to monitor diversity and image quality, and validate it on DCGAN and MSG-GAN architectures. Their results show that loss alone is unreliable across settings, but a two-step, combined loss–MS-SSIM–FID criterion can reduce training time significantly while preserving high-quality, diverse synthetic images; DCGAN training time dropped from 19 hours to 10 hours, and MSG-GAN from 181 hours to 126 hours. The work highlights the potential for more accessible GAN-based biomedical image synthesis, while acknowledging dataset limitations and the need for broader validation on other imaging modalities.
Abstract
Generative Adversarial Networks (GANs) have high computational costs to train their complex architectures. Throughout the training process, GANs' output is analyzed qualitatively based on the loss and synthetic images' diversity and quality. Based on this qualitative analysis, training is manually halted once the desired synthetic images are generated. By utilizing an early stopping criterion, the computational cost and dependence on manual oversight can be reduced yet impacted by training problems such as mode collapse, non-convergence, and instability. This is particularly prevalent in biomedical imagery, where training problems degrade the diversity and quality of synthetic images, and the high computational cost associated with training makes complex architectures increasingly inaccessible. This work proposes a novel early stopping criteria to quantitatively detect training problems, halt training, and reduce the computational costs associated with synthesizing biomedical images. Firstly, the range of generator and discriminator loss values is investigated to assess whether mode collapse, non-convergence, and instability occur sequentially, concurrently, or interchangeably throughout the training of GANs. Secondly, utilizing these occurrences in conjunction with the Mean Structural Similarity Index (MS-SSIM) and Fréchet Inception Distance (FID) scores of synthetic images forms the basis of the proposed early stopping criteria. This work helps identify the occurrence of training problems in GANs using low-resource computational cost and reduces training time to generate diversified and high-quality synthetic images.
