Empirical Results for Adjusting Truncated Backpropagation Through Time while Training Neural Audio Effects
Yann Bourdin, Pierrick Legrand, Fanny Roche
TL;DR
This paper empirically analyzes how truncated backpropagation through time (TBPTT) hyperparameters—number of sub-sequences N, batch size B, and sub-sequence length L—affect training dynamics and performance in a convolutional‑recurrent neural model for dynamic range compression (DRC). Using the SPTMod architecture with a State Prediction Network (SPN) and FiLM/TFiLM conditioning, the authors show that careful TBPTT tuning improves accuracy and stability and can reduce compute, validated by objective losses and a perceptual listening test. Across snapshot and full-modeling datasets, larger Lc and B generally boost accuracy and reduce variance, though results depend on data diversity and model configuration; TBPTT enables larger batch sizes and faster convergence. Subjective evaluations confirm that perceptual quality remains high under optimized TBPTT settings, suggesting practical benefits for real-time neural audio effects and informing future multi‑objective hyperparameter optimization and implementation in performant languages like C++.
Abstract
This paper investigates the optimization of Truncated Backpropagation Through Time (TBPTT) for training neural networks in digital audio effect modeling, with a focus on dynamic range compression. The study evaluates key TBPTT hyperparameters -- sequence number, batch size, and sequence length -- and their influence on model performance. Using a convolutional-recurrent architecture, we conduct extensive experiments across datasets with and without conditionning by user controls. Results demonstrate that carefully tuning these parameters enhances model accuracy and training stability, while also reducing computational demands. Objective evaluations confirm improved performance with optimized settings, while subjective listening tests indicate that the revised TBPTT configuration maintains high perceptual quality.
