Table of Contents
Fetching ...

Multi-Resolution Model Fusion for Accelerating the Convolutional Neural Network Training

Kewei Wang, Claire Songhyun Lee, Sunwoo Lee, Vishu Gupta, Jan Balewski, Alex Sim, Peter Nugent, Ankit Agrawal, Alok Choudhary, Kesheng Wu, Wei-keng Liao

TL;DR

The paper tackles the high cost of training large-scale CNNs on scientific data by introducing Multi-Resolution Model Fusion (MRMF), which pretrains models on reduced-resolution data and fuses them with higher-resolution counterparts across multiple stages before final finetuning at the original resolution. By leveraging bottom/top layer fusion and concurrent training, MRMF accelerates end-to-end training while preserving accuracy, as demonstrated on CosmoFlow and Neuron Inverter with up to substantial time savings and near-linear scaling on HPC systems. The approach extends prior MRT by enabling multi-stage fusion and partial parameter transfer between layers, supported by analyses of data-generation costs and batch-size effects. The results indicate that MRMF can significantly reduce training time with negligible preprocessing overhead, offering practical impact for large-scale scientific DL workloads and potential extensions to other architectures such as Graph Neural Networks.

Abstract

Neural networks are rapidly gaining popularity in scientific research, but training the models is often very time-consuming. Particularly when the training data samples are large high-dimensional arrays, efficient training methodologies that can reduce the computational costs are crucial. To reduce the training cost, we propose a Multi-Resolution Model Fusion (MRMF) method that combines models trained on reduced-resolution data and then refined with data in the original resolution. We demonstrate that these reduced-resolution models and datasets could be generated quickly. More importantly, the proposed approach reduces the training time by speeding up the model convergence in each fusion stage before switching to the final stage of finetuning with data in its original resolution. This strategy ensures the final model retains high-resolution insights while benefiting from the computational efficiency of lower-resolution training. Our experiment results demonstrate that the multi-resolution model fusion method can significantly reduce end-to-end training time while maintaining the same model accuracy. Evaluated using two real-world scientific applications, CosmoFlow and Neuron Inverter, the proposed method improves the training time by up to 47% and 44%, respectively, as compared to the original resolution training, while the model accuracy is not affected.

Multi-Resolution Model Fusion for Accelerating the Convolutional Neural Network Training

TL;DR

The paper tackles the high cost of training large-scale CNNs on scientific data by introducing Multi-Resolution Model Fusion (MRMF), which pretrains models on reduced-resolution data and fuses them with higher-resolution counterparts across multiple stages before final finetuning at the original resolution. By leveraging bottom/top layer fusion and concurrent training, MRMF accelerates end-to-end training while preserving accuracy, as demonstrated on CosmoFlow and Neuron Inverter with up to substantial time savings and near-linear scaling on HPC systems. The approach extends prior MRT by enabling multi-stage fusion and partial parameter transfer between layers, supported by analyses of data-generation costs and batch-size effects. The results indicate that MRMF can significantly reduce training time with negligible preprocessing overhead, offering practical impact for large-scale scientific DL workloads and potential extensions to other architectures such as Graph Neural Networks.

Abstract

Neural networks are rapidly gaining popularity in scientific research, but training the models is often very time-consuming. Particularly when the training data samples are large high-dimensional arrays, efficient training methodologies that can reduce the computational costs are crucial. To reduce the training cost, we propose a Multi-Resolution Model Fusion (MRMF) method that combines models trained on reduced-resolution data and then refined with data in the original resolution. We demonstrate that these reduced-resolution models and datasets could be generated quickly. More importantly, the proposed approach reduces the training time by speeding up the model convergence in each fusion stage before switching to the final stage of finetuning with data in its original resolution. This strategy ensures the final model retains high-resolution insights while benefiting from the computational efficiency of lower-resolution training. Our experiment results demonstrate that the multi-resolution model fusion method can significantly reduce end-to-end training time while maintaining the same model accuracy. Evaluated using two real-world scientific applications, CosmoFlow and Neuron Inverter, the proposed method improves the training time by up to 47% and 44%, respectively, as compared to the original resolution training, while the model accuracy is not affected.

Paper Structure

This paper contains 24 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Overview of Multi-resolution Model Fusion (MRMF) method. The model architecture from the Neuron Inverter benchmark is used as an example. The model architectures of the coarse model $\mathcal{M}_c$ and the dense model $\mathcal{M}_d$ are on the top left and top right sides, respectively. The fully connected layer marked in yellow has a different weight size between the two models. Before model fusion, the coarse model is trained with the lower-resolution data and the dense model is trained with the higher-resolution data. The bottom layer group (marked in orange) from the coarse model is fused with the top layer group (marked in red) from the dense model. After model fusion, the fused model $\mathcal{M}_f$ is finetuned with the higher-resolution data.
  • Figure 2: The pipeline of extending the pretraining phase into multiple stages to use progressively increased two different resolution data sets to conduct two-model training in each stage. The top layers of the higher-resolution model and the bottom layers of the lower-resolution model are then combined into the fused model for the next stage.
  • Figure 3: Training timing breakdown for CosmoFlow comparing the baseline (the original model), the MRT method, and the proposed MRMF method using one fusion and two fusions. The experiments are conducted on both 32 and 64 GPUs on Perlmutter.
  • Figure 4: Comparison of the training timing breakdown for Neuron Inverter between baseline, the MRT method, and the proposed MRMF method with one fusion and two fusions on Perlmutter. The experiments are conducted on both 16 and 32 GPUs.
  • Figure 5: Comparison of training time across using different local batch sizes of low-resolution data in the pretraining phase on Neuron Inverter.