Blind estimation of audio effects using an auto-encoder approach and differentiable digital signal processing
Côme Peladeau, Geoffroy Peeters
TL;DR
The paper addresses blind estimation of audio mastering effects (BE-AFX) by introducing an auto-encoder that learns to reproduce the processed output from the original dry signal without requiring knowledge of the exact effect implementations. The method trains an analysis network to predict effect parameters by minimizing an audio-domain loss, using differentiable or neural proxies for equalizer, compressor, and clipper. It demonstrates that optimizing the audio reconstruction yields superior perceptual replication of the mastering chain compared to traditional parameter-distance training, even when parameter estimation is less accurate. The work enables learning from real dry/wet data without paired implementation details and paves the way for subjective perceptual evaluations and broader mastering scenarios. The approach has potential practical impact for automated mastering tools and music production workflows where exact AFX internals are unknown or variable.
Abstract
Blind Estimation of Audio Effects (BE-AFX) aims at estimating the Audio Effects (AFXs) applied to an original, unprocessed audio sample solely based on the processed audio sample. To train such a system traditional approaches optimize a loss between ground truth and estimated AFX parameters. This involves knowing the exact implementation of the AFXs used for the process. In this work, we propose an alternative solution that eliminates the requirement for knowing this implementation. Instead, we introduce an auto-encoder approach, which optimizes an audio quality metric. We explore, suggest, and compare various implementations of commonly used mastering AFXs, using differential signal processing or neural approximations. Our findings demonstrate that our auto-encoder approach yields superior estimates of the audio quality produced by a chain of AFXs, compared to the traditional parameter-based approach, even if the latter provides a more accurate parameter estimation.
