Table of Contents
Fetching ...

Tweaking autoregressive methods for inpainting of gaps in audio signals

Ondřej Mokrý, Pavel Rajmic

TL;DR

The paper analyzes autoregressive approaches for inpainting audio gaps up to 80 ms, comparing extrapolation-based methods, frame-wise Janssen, and a novel gap-wise Janssen variant. It systematically investigates AR estimators (LPC vs Burg) and model order, showing Burg generally yields better quality, and demonstrates that the gap-wise Janssen method achieves superior objective (PEMO-Q) and subjective performance, often surpassing sparsity-based baselines. Across solo-instrument and mid-scale datasets, results emphasize the importance of estimator choice and context coupling, with longer window lengths improving performance for windowed approaches. The work provides practical guidance and publicly available MATLAB implementations for AR-based inpainting, highlighting tradeoffs between reconstruction quality and computational load.

Abstract

A novel variant of the Janssen method for audio inpainting is presented and compared to other popular audio inpainting methods based on autoregressive (AR) modeling. Both conceptual differences and practical implications are discussed. The experiments demonstrate the importance of the choice of the AR model estimator, window/context length, and model order. The results show the superiority of the proposed gap-wise Janssen approach using objective metrics, which is confirmed by a listening test.

Tweaking autoregressive methods for inpainting of gaps in audio signals

TL;DR

The paper analyzes autoregressive approaches for inpainting audio gaps up to 80 ms, comparing extrapolation-based methods, frame-wise Janssen, and a novel gap-wise Janssen variant. It systematically investigates AR estimators (LPC vs Burg) and model order, showing Burg generally yields better quality, and demonstrates that the gap-wise Janssen method achieves superior objective (PEMO-Q) and subjective performance, often surpassing sparsity-based baselines. Across solo-instrument and mid-scale datasets, results emphasize the importance of estimator choice and context coupling, with longer window lengths improving performance for windowed approaches. The work provides practical guidance and publicly available MATLAB implementations for AR-based inpainting, highlighting tradeoffs between reconstruction quality and computational load.

Abstract

A novel variant of the Janssen method for audio inpainting is presented and compared to other popular audio inpainting methods based on autoregressive (AR) modeling. Both conceptual differences and practical implications are discussed. The experiments demonstrate the importance of the choice of the AR model estimator, window/context length, and model order. The results show the superiority of the proposed gap-wise Janssen approach using objective metrics, which is confirmed by a listening test.
Paper Structure (9 sections, 2 equations, 5 figures)

This paper contains 9 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: Comparison of the estimators in terms of PEMO-Q ODG for window/context length 4096 samples. Per each inpainting method, the scatter plot shows the individual results using LPC vs. the Burg algorithm to estimate the AR coefficients. The effect of the model order $p$ is analyzed separately in Fig. \ref{['fig:model_order']}.
  • Figure 2: Comparison of the model order choices in terms of PEMO-Q ODG for window/context length 4096 samples. Per each inpainting method, the plot shows averaged results using LPC (darker shade, dashed line) vs. the Burg algorithm (lighter shade, solid line) to estimate the AR coefficients.
  • Figure 3: Comparison of the AR-based methods with SPAIN in terms of SDR (top) and ODG (bottom), averaged over all signals. In this experiment, all AR-based methods used the Burg algorithm to estimate the coefficients, using the best performing order $p$ according to the results reported in Fig. \ref{['fig:model_order']}.
  • Figure 4: A boxplot showing the distribution of scores in the listening test. The proposed gap-wise Janssen method proves to be the best performing method, which is also confirmed statistically, since non-overlapping notches (filled areas) imply the difference of medians at the 5% significance level.
  • Figure 5: Comparison of the AR-based methods with SPAIN in terms of SDR, using an increased window/context length of $8192$ samples on the solo-instrument dataset. All AR-based methods used the Burg algorithm to estimate the coefficients and the orders were selected as the best-performing for this window/context length. A-SPAIN-MOD is omitted for computational reasons (taking around 3.5 hours per signal).