Multi-Resolution Model Fusion for Accelerating the Convolutional Neural Network Training

Kewei Wang; Claire Songhyun Lee; Sunwoo Lee; Vishu Gupta; Jan Balewski; Alex Sim; Peter Nugent; Ankit Agrawal; Alok Choudhary; Kesheng Wu; Wei-keng Liao

Multi-Resolution Model Fusion for Accelerating the Convolutional Neural Network Training

Kewei Wang, Claire Songhyun Lee, Sunwoo Lee, Vishu Gupta, Jan Balewski, Alex Sim, Peter Nugent, Ankit Agrawal, Alok Choudhary, Kesheng Wu, Wei-keng Liao

TL;DR

The paper tackles the high cost of training large-scale CNNs on scientific data by introducing Multi-Resolution Model Fusion (MRMF), which pretrains models on reduced-resolution data and fuses them with higher-resolution counterparts across multiple stages before final finetuning at the original resolution. By leveraging bottom/top layer fusion and concurrent training, MRMF accelerates end-to-end training while preserving accuracy, as demonstrated on CosmoFlow and Neuron Inverter with up to substantial time savings and near-linear scaling on HPC systems. The approach extends prior MRT by enabling multi-stage fusion and partial parameter transfer between layers, supported by analyses of data-generation costs and batch-size effects. The results indicate that MRMF can significantly reduce training time with negligible preprocessing overhead, offering practical impact for large-scale scientific DL workloads and potential extensions to other architectures such as Graph Neural Networks.

Abstract

Neural networks are rapidly gaining popularity in scientific research, but training the models is often very time-consuming. Particularly when the training data samples are large high-dimensional arrays, efficient training methodologies that can reduce the computational costs are crucial. To reduce the training cost, we propose a Multi-Resolution Model Fusion (MRMF) method that combines models trained on reduced-resolution data and then refined with data in the original resolution. We demonstrate that these reduced-resolution models and datasets could be generated quickly. More importantly, the proposed approach reduces the training time by speeding up the model convergence in each fusion stage before switching to the final stage of finetuning with data in its original resolution. This strategy ensures the final model retains high-resolution insights while benefiting from the computational efficiency of lower-resolution training. Our experiment results demonstrate that the multi-resolution model fusion method can significantly reduce end-to-end training time while maintaining the same model accuracy. Evaluated using two real-world scientific applications, CosmoFlow and Neuron Inverter, the proposed method improves the training time by up to 47% and 44%, respectively, as compared to the original resolution training, while the model accuracy is not affected.

Multi-Resolution Model Fusion for Accelerating the Convolutional Neural Network Training

TL;DR

Abstract

Multi-Resolution Model Fusion for Accelerating the Convolutional Neural Network Training

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)