Table of Contents
Fetching ...

Apollo: Band-sequence Modeling for High-Quality Audio Restoration

Kai Li, Yi Luo

TL;DR

Audio restoration must recover undistorted sound from compressed or degraded input, with particular difficulty in preserving mid-to-high-frequency content. Apollo addresses this by a band-aware generator that splits the spectrum into sub-bands, applies Roformer-based band-sequence modeling and time-domain reconstruction within a GAN framework using multi-resolution STFT losses and feature matching. The approach introduces three modules—band-split, band-sequence modeling, and band-reconstruction—and demonstrates improved restoration quality over SR-GAN across diverse music genres and codecs, while maintaining streaming-friendly, causal processing and a compact model size. The results suggest Apollo enables high-fidelity, real-time audio restoration suitable for music, codecs, and streaming scenarios, with practical impact on playback quality and communication systems.

Abstract

Audio restoration has become increasingly significant in modern society, not only due to the demand for high-quality auditory experiences enabled by advanced playback devices, but also because the growing capabilities of generative audio models necessitate high-fidelity audio. Typically, audio restoration is defined as a task of predicting undistorted audio from damaged input, often trained using a GAN framework to balance perception and distortion. Since audio degradation is primarily concentrated in mid- and high-frequency ranges, especially due to codecs, a key challenge lies in designing a generator capable of preserving low-frequency information while accurately reconstructing high-quality mid- and high-frequency content. Inspired by recent advancements in high-sample-rate music separation, speech enhancement, and audio codec models, we propose Apollo, a generative model designed for high-sample-rate audio restoration. Apollo employs an explicit frequency band split module to model the relationships between different frequency bands, allowing for more coherent and higher-quality restored audio. Evaluated on the MUSDB18-HQ and MoisesDB datasets, Apollo consistently outperforms existing SR-GAN models across various bit rates and music genres, particularly excelling in complex scenarios involving mixtures of multiple instruments and vocals. Apollo significantly improves music restoration quality while maintaining computational efficiency. The source code for Apollo is publicly available at https://github.com/JusperLee/Apollo.

Apollo: Band-sequence Modeling for High-Quality Audio Restoration

TL;DR

Audio restoration must recover undistorted sound from compressed or degraded input, with particular difficulty in preserving mid-to-high-frequency content. Apollo addresses this by a band-aware generator that splits the spectrum into sub-bands, applies Roformer-based band-sequence modeling and time-domain reconstruction within a GAN framework using multi-resolution STFT losses and feature matching. The approach introduces three modules—band-split, band-sequence modeling, and band-reconstruction—and demonstrates improved restoration quality over SR-GAN across diverse music genres and codecs, while maintaining streaming-friendly, causal processing and a compact model size. The results suggest Apollo enables high-fidelity, real-time audio restoration suitable for music, codecs, and streaming scenarios, with practical impact on playback quality and communication systems.

Abstract

Audio restoration has become increasingly significant in modern society, not only due to the demand for high-quality auditory experiences enabled by advanced playback devices, but also because the growing capabilities of generative audio models necessitate high-fidelity audio. Typically, audio restoration is defined as a task of predicting undistorted audio from damaged input, often trained using a GAN framework to balance perception and distortion. Since audio degradation is primarily concentrated in mid- and high-frequency ranges, especially due to codecs, a key challenge lies in designing a generator capable of preserving low-frequency information while accurately reconstructing high-quality mid- and high-frequency content. Inspired by recent advancements in high-sample-rate music separation, speech enhancement, and audio codec models, we propose Apollo, a generative model designed for high-sample-rate audio restoration. Apollo employs an explicit frequency band split module to model the relationships between different frequency bands, allowing for more coherent and higher-quality restored audio. Evaluated on the MUSDB18-HQ and MoisesDB datasets, Apollo consistently outperforms existing SR-GAN models across various bit rates and music genres, particularly excelling in complex scenarios involving mixtures of multiple instruments and vocals. Apollo significantly improves music restoration quality while maintaining computational efficiency. The source code for Apollo is publicly available at https://github.com/JusperLee/Apollo.
Paper Structure (13 sections, 6 equations, 2 figures, 2 tables)

This paper contains 13 sections, 6 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overall pipeline of the model architecture of Apollo and its modules.
  • Figure 2: Apollo and SR-GAN's SDR, SI-SNR and ViSQOL result in comparison at different bitrates.