Distortion Recovery: A Two-Stage Method for Guitar Effect Removal
Ying-Shuo Lee, Yueh-Po Peng, Jui-Te Wu, Ming Cheng, Li Su, Yi-Hsuan Yang
TL;DR
The paper addresses distortion recovery for electric guitar by removing real-world effect processing from recordings. It introduces a two-stage framework: a Mel-spectrogram denoiser that produces a dry Mel representation, followed by a neural vocoder (HiFi-GAN) that reconstructs the dry waveform. Evaluations on VST-derived data and synthetic baselines show superior objective metrics (e.g., lower FAD and higher SI-SDR) and strong subjective quality (MOS around 4), especially when trained on realistic VST data. The approach demonstrates improved fidelity and practical potential for downstream tasks such as transcription and mixing, and it emphasizes the importance of realistic training data for distortion removal.
Abstract
Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions that may be too simplistic to accurately capture the complexities seen in real-world recordings. In this paper, we tackle the task by using a dataset of guitar recordings rendered with commercial-grade audio effect VST plugins. Moreover, we introduce a novel two-stage methodology for audio distortion recovery. The idea is to firstly process the audio signal in the Mel-spectrogram domain in the first stage, and then use a neural vocoder to generate the pristine original guitar sound from the processed Mel-spectrogram in the second stage. We report a set of experiments demonstrating the effectiveness of our approach over existing methods, through both subjective and objective evaluation metrics.
