Baseline Systems For The 2025 Low-Resource Audio Codec Challenge

Yusuf Ziya Isik; Rafał Łaganowski

Baseline Systems For The 2025 Low-Resource Audio Codec Challenge

Yusuf Ziya Isik, Rafał Łaganowski

TL;DR

This work establishes baseline neural codecs for the 2025 LRAC Challenge under stringent low-resource conditions: a 24 kHz sampling rate with ultra-low to low bitrates ($1 kbps$–$6 kbps$) and tight latency/compute budgets. It presents two RAPID-TO-REPRODUCE baselines built on convolutional encoder–decoder architectures with Residual Vector Quantization and GAN-based training, tailored for Track 1 (transparency) and Track 2 (enhancement). Data curation, augmentation strategies, and a comprehensive training regimen are detailed, including end-to-end optimization, EMA for codebooks, and multi-scale discriminators, with open-source repositories and trained weights released. The results provide objective metrics across clean, noisy, and reverberant conditions and establish a reproducible benchmark for low-resource speech coding in noisy environments, supporting rapid comparison and advancement in neural audio codecs. The work’s public data pipelines, architecture choices, and evaluation framework offer practical impact for researchers and practitioners targeting real-time, bandwidth-constrained speech coding.

Abstract

The Low-Resource Audio Codec (LRAC) Challenge aims to advance neural audio coding for deployment in resource-constrained environments. The first edition focuses on low-resource neural speech codecs that must operate reliably under everyday noise and reverberation, while satisfying strict constraints on computational complexity, latency, and bitrate. Track 1 targets transparency codecs, which aim to preserve the perceptual transparency of input speech under mild noise and reverberation. Track 2 addresses enhancement codecs, which combine coding and compression with denoising and dereverberation. This paper presents the official baseline systems for both tracks in the 2025 LRAC Challenge. The baselines are convolutional neural codec models with Residual Vector Quantization, trained end-to-end using a combination of adversarial and reconstruction objectives. We detail the data filtering and augmentation strategies, model architectures, optimization procedures, and checkpoint selection criteria.

Baseline Systems For The 2025 Low-Resource Audio Codec Challenge

TL;DR

This work establishes baseline neural codecs for the 2025 LRAC Challenge under stringent low-resource conditions: a 24 kHz sampling rate with ultra-low to low bitrates (

–

) and tight latency/compute budgets. It presents two RAPID-TO-REPRODUCE baselines built on convolutional encoder–decoder architectures with Residual Vector Quantization and GAN-based training, tailored for Track 1 (transparency) and Track 2 (enhancement). Data curation, augmentation strategies, and a comprehensive training regimen are detailed, including end-to-end optimization, EMA for codebooks, and multi-scale discriminators, with open-source repositories and trained weights released. The results provide objective metrics across clean, noisy, and reverberant conditions and establish a reproducible benchmark for low-resource speech coding in noisy environments, supporting rapid comparison and advancement in neural audio codecs. The work’s public data pipelines, architecture choices, and evaluation framework offer practical impact for researchers and practitioners targeting real-time, bandwidth-constrained speech coding.

Baseline Systems For The 2025 Low-Resource Audio Codec Challenge

TL;DR

Abstract

Baseline Systems For The 2025 Low-Resource Audio Codec Challenge

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)