A Diffusion-Based Generative Equalizer for Music Restoration

Eloi Moliner; Maija Turunen; Filip Elvander; Vesa Välimäki

A Diffusion-Based Generative Equalizer for Music Restoration

Eloi Moliner, Maija Turunen, Filip Elvander, Vesa Välimäki

TL;DR

The paper addresses restoring severely degraded historical music by reframing equalization as a generative restoration task and adopting diffusion posterior sampling with a learnable, zero-phase degradation model $H_\\phi$. The core contributions include a 5-stage, piecewise-linear frequency-response equalizer with breakpoint regularization, noise regularization, LTAS-based initialization, and an improved inference algorithm using Adam optimization and a 2nd-order sampler. Empirical results on historical piano and singing recordings show that BABE-2, especially when combined with LTAS-based objectives or initialization, delivers superior objective metrics (e.g., Fréchet Audio Distance) and more balanced spectral fidelity compared to the original BABE and baselines, while also revealing qualitative nuances in voice restoration and the importance of reference voice selection. The work demonstrates substantial progress in the practical restoration of historical music, enabling clearer playback and more authentic vocal timbres, with clear directions for handling nonlinear degradations and further perceptual validation.

Abstract

This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music.

A Diffusion-Based Generative Equalizer for Music Restoration

TL;DR

. The core contributions include a 5-stage, piecewise-linear frequency-response equalizer with breakpoint regularization, noise regularization, LTAS-based initialization, and an improved inference algorithm using Adam optimization and a 2nd-order sampler. Empirical results on historical piano and singing recordings show that BABE-2, especially when combined with LTAS-based objectives or initialization, delivers superior objective metrics (e.g., Fréchet Audio Distance) and more balanced spectral fidelity compared to the original BABE and baselines, while also revealing qualitative nuances in voice restoration and the importance of reference voice selection. The work demonstrates substantial progress in the practical restoration of historical music, enabling clearer playback and more authentic vocal timbres, with clear directions for handling nonlinear degradations and further perceptual validation.

Abstract

Paper Structure (25 sections, 20 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 20 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Background
Diffusion Models
Diffusion Posterior Sampling
Application to Blind Inverse Problems
BABE-2: Unique Contributions
Filter Parameterization
Breakpoint-Collapse Regularization
Noise Regularization
Long-term Average Spectrum-based Initialization
Improved Inference Algorithm
Experiments and Evaluation
Piano Recordings Evaluation
Evaluation of Singing Voice Recordings
Discussion: Restoring historical voices
...and 10 more sections

Figures (4)

Figure 1: Proposed frequency-response equalizer model consists of breakpoints creating a piecewise linear magnitude response.
Figure 2: Comparative LTAS analysis of original and restored piano recordings using different methods.
Figure 3: Singing voice restoration pipeline.
Figure 4: Spectrogram representations of two vocal restoration examples. The colored boxes highlight key points discussed in Sec. \ref{['sec:qualitativeanalysis']}.

A Diffusion-Based Generative Equalizer for Music Restoration

TL;DR

Abstract

A Diffusion-Based Generative Equalizer for Music Restoration

Authors

TL;DR

Abstract

Table of Contents

Figures (4)