Partitioned Neural Network Training via Synthetic Intermediate Labels
Cevat Volkan Karadağ, Nezih Topaloğlu
TL;DR
This work addresses the resource demands of training large neural networks by introducing Partitioned Neural Network (PNN) training, which splits a model into subnetworks and trains them sequentially using Synthetic Intermediate Labels (SIL) to reduce inter-partition communication and memory usage. SIL are randomly generated labels with a defined structure, enabling the left partition to learn without access to the right partition, while the right partition is trained on true labels before combining the partitions. Empirical validation on a 6-layer fully connected network using EMNIST shows that PNN can achieve accuracies close to conventional training while significantly lowering memory and computation, with potential improvements via a recovery-epoch phase. The approach offers a pathway to more efficient training of large models, and future work aims to extend the method to CNNs, RNNs, and transformer architectures for broader applicability.
Abstract
The proliferation of extensive neural network architectures, particularly deep learning models, presents a challenge in terms of resource-intensive training. GPU memory constraints have become a notable bottleneck in training such sizable models. Existing strategies, including data parallelism, model parallelism, pipeline parallelism, and fully sharded data parallelism, offer partial solutions. Model parallelism, in particular, enables the distribution of the entire model across multiple GPUs, yet the ensuing data communication between these partitions slows down training. Additionally, the substantial memory overhead required to store auxiliary parameters on each GPU compounds computational demands. Instead of using the entire model for training, this study advocates partitioning the model across GPUs and generating synthetic intermediate labels to train individual segments. These labels, produced through a random process, mitigate memory overhead and computational load. This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy. To validate this method, a 6-layer fully connected neural network is partitioned into two parts and its performance is assessed on the extended MNIST dataset. Experimental results indicate that the proposed approach achieves similar testing accuracies to conventional training methods, while significantly reducing memory and computational requirements. This work contributes to mitigating the resource-intensive nature of training large neural networks, paving the way for more efficient deep learning model development.
