Janssen 2.0: Audio Inpainting in the Time-frequency Domain

Ondřej Mokrý; Peter Balušík; Pavel Rajmic

Janssen 2.0: Audio Inpainting in the Time-frequency Domain

Ondřej Mokrý, Peter Balušík, Pavel Rajmic

TL;DR

The paper tackles inpainting in the time-frequency domain by adapting the Janssen autoregressive method to spectrograms (Janssen-TF) and benchmarking it against a deep-prior TF inpainting approach (DPAI). Janssen-TF casts inpainting as an AR-constrained TF problem solved via ADMM under STFT constraints, achieving superior objective and subjective performance across gap lengths. Experiments on small and larger datasets show Janssen-TF generally outperforms DPAI, with context-aware variants providing additional gains for shorter gaps. The work highlights the value of TF-domain autoregressive priors for audio inpainting and provides open-source MATLAB implementations.

Abstract

The paper focuses on inpainting missing parts of an audio signal spectrogram, i.e., estimating the lacking time-frequency coefficients. The autoregression-based Janssen algorithm, a state-of-the-art for the time-domain audio inpainting, is adapted for the time-frequency setting. This novel method, termed Janssen-TF, is compared with the deep-prior neural network approach using both objective metrics and a subjective listening test, proving Janssen-TF to be superior in all the considered measures.

Janssen 2.0: Audio Inpainting in the Time-frequency Domain

TL;DR

Abstract

Paper Structure (11 sections, 5 equations, 3 figures, 1 algorithm)

This paper contains 11 sections, 5 equations, 3 figures, 1 algorithm.

Introduction
Gaps and the Short-time Fourier Transform
Deep Prior Audio Inpainting (DPAI)
Application of Deep Prior to Audio Inpainting
Architecture
Variants
Janssen spectrogram inpainting (Janssen-TF)
Experiments
Data, setup and metrics
Results
Conclusion and Outlook

Figures (3)

Figure 1: Comparison of the inpainting methods using objective metrics -- SNR (top) and ODG (bottom). The plots show results averaged over the 8 test signals together with the bootstrap interval estimates of the mean values at the $\alpha=5\%$ significance level EfronTibshirani:Bootstrap. If the intervals do not overlap, it may be concluded that the difference of the means is statistically significant.
Figure 2: A boxplot showing the distribution of scores in the listening test. The individual boxes span from the 25th to the 75th percentile of the recorded scores. The notches (filled areas) around the medians (orange lines) are constructed such that boxes whose notches do not overlap have different medians at the 5% statistical significance level.
Figure 3: SNR and ODG results on the IRMAS dataset.

Janssen 2.0: Audio Inpainting in the Time-frequency Domain

TL;DR

Abstract

Janssen 2.0: Audio Inpainting in the Time-frequency Domain

Authors

TL;DR

Abstract

Table of Contents

Figures (3)