BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling
Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You
TL;DR
The paper tackles efficient training for variable-length sequences in distributed data-parallel settings by introducing BLoad, a block-based padding scheme that builds fixed-length $T_{\max}$ blocks from shorter sequences and uses a start-index table to maintain per-sequence boundaries within a DDS workflow. This approach achieves over a $100\times$ reduction in padding without deleting any frames, improving training time and recall on the Action Genome dataset while mitigating deadlock risk inherent in standard DDP with variable-length data. Experiments compare naive padding, sampling, and the proposed block pad, showing substantial waste reduction and favorable performance when temporal structure is preserved. The method is applicable to multiple modalities (videos, audio, text) and is publicly available at GitHub for broader adoption.
Abstract
The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments.
