Table of Contents
Fetching ...

Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control

Yu-Hua Chen, Yen-Tung Yeh, Yuan-Chiao Cheng, Jui-Te Wu, Yu-Hsiang Ho, Jyh-Shing Roger Jang, Yi-Hsuan Yang

TL;DR

This work addresses zero-shot amplifier modeling by introducing a one-to-many neural framework that conditions a single generator on a tone embedding derived from a reference wet signal. A SimCLR-style tone embedding encoder captures tone-related features, enabling conditioning without re-training and supporting zero-shot tone transfer. Experiments show that FiLM-conditioned GCNs with ToneEmb outperform LUT-based conditioning and demonstrate partial generalization to unseen amps, including a self-recorded case study. The results suggest a promising path toward universal amplifier modeling with flexible tone interpolation and extrapolation in audio effects.

Abstract

Replicating analog device circuits through neural audio effect modeling has garnered increasing interest in recent years. Existing work has predominantly focused on a one-to-one emulation strategy, modeling specific devices individually. In this paper, we tackle the less-explored scenario of one-to-many emulation, utilizing conditioning mechanisms to emulate multiple guitar amplifiers through a single neural model. For condition representation, we use contrastive learning to build a tone embedding encoder that extracts style-related features of various amplifiers, leveraging a dataset of comprehensive amplifier settings. Targeting zero-shot application scenarios, we also examine various strategies for tone embedding representation, evaluating referenced tone embedding against two retrieval-based embedding methods for amplifiers unseen in the training time. Our findings showcase the efficacy and potential of the proposed methods in achieving versatile one-to-many amplifier modeling, contributing a foundational step towards zero-shot audio modeling applications.

Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control

TL;DR

This work addresses zero-shot amplifier modeling by introducing a one-to-many neural framework that conditions a single generator on a tone embedding derived from a reference wet signal. A SimCLR-style tone embedding encoder captures tone-related features, enabling conditioning without re-training and supporting zero-shot tone transfer. Experiments show that FiLM-conditioned GCNs with ToneEmb outperform LUT-based conditioning and demonstrate partial generalization to unseen amps, including a self-recorded case study. The results suggest a promising path toward universal amplifier modeling with flexible tone interpolation and extrapolation in audio effects.

Abstract

Replicating analog device circuits through neural audio effect modeling has garnered increasing interest in recent years. Existing work has predominantly focused on a one-to-one emulation strategy, modeling specific devices individually. In this paper, we tackle the less-explored scenario of one-to-many emulation, utilizing conditioning mechanisms to emulate multiple guitar amplifiers through a single neural model. For condition representation, we use contrastive learning to build a tone embedding encoder that extracts style-related features of various amplifiers, leveraging a dataset of comprehensive amplifier settings. Targeting zero-shot application scenarios, we also examine various strategies for tone embedding representation, evaluating referenced tone embedding against two retrieval-based embedding methods for amplifiers unseen in the training time. Our findings showcase the efficacy and potential of the proposed methods in achieving versatile one-to-many amplifier modeling, contributing a foundational step towards zero-shot audio modeling applications.
Paper Structure (15 sections, 6 figures, 2 tables)

This paper contains 15 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A one-to-one approach cannot emulate an unseen audio effect. In contrast, the proposed one-to-many approach can achieve zero-shot modeling by using a tone embedding encoder that turns a reference audio example of that effect into a conditioning input at inference time.
  • Figure 2: Diagram of the audio processing workflow. A clean signal $\mathbf{x}$ is input into the generator $\mathcal{G}$, which uses the tone embedding from the tone embedding encoder to produce the wet signal $\mathbf{y}$. The encoder $\mathcal{E}$ generates the tone embedding $\phi$ by analyzing a reference wet signal $\mathbf{z}$.
  • Figure 3: Diagram of a layer of the generator $\mathcal{G}$, which uses gated convolutional neural network (GCN) comunita2023modelling as the backbone and FiLM perez2018film for conditioning; $\mathbf{h}_{l}$ denotes the output of the previous layer and $\mathbf{h}_{l+1}$ the current layer.
  • Figure 4: A t-SNE visualization of the tone embeddings from the wet signals of the $N=9$ amps. Each point represents a tone embedding extracted from a wet signal, with color and shape indicating the category of the amp tone. We see 2 big cross-amp clusters and 9 small clusters for each amp, suggesting the ability of the encoder $\mathcal{E}$ to distinguish between different tones based on their embeddings.
  • Figure 5: Spectrograms of the input clean signal, target wet signal, and the generated result of the proposed one-to-many FiLM-GCN model, in the zero-shot case study reported in Section \ref{['exp_sec:case_study']}. The orange squares show that our model still struggles to model high-frequency components.
  • ...and 1 more figures