Table of Contents
Fetching ...

Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis

Robin Doerfler, Lonce Wyse

TL;DR

The Pulse-Train-Resonator model is presented, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics.

Abstract

Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the underlying pulse shapes and temporal structure. We present the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics. The architecture integrates physics-informed inductive biases including harmonic decay, thermodynamic pitch modulation, valve-dynamics envelopes, exhaust system resonances and derived engine operating modes such as throttle operation and deceleration fuel cutoff (DCFO). Validated on three diverse engine types totaling 7.5 hours of audio, PTR achieves a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss over a harmonic-plus-noise baseline model, while providing interpretable parameters corresponding to physical phenomena. Complete code, model weights, and audio examples are openly available.

Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis

TL;DR

The Pulse-Train-Resonator model is presented, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics.

Abstract

Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the underlying pulse shapes and temporal structure. We present the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics. The architecture integrates physics-informed inductive biases including harmonic decay, thermodynamic pitch modulation, valve-dynamics envelopes, exhaust system resonances and derived engine operating modes such as throttle operation and deceleration fuel cutoff (DCFO). Validated on three diverse engine types totaling 7.5 hours of audio, PTR achieves a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss over a harmonic-plus-noise baseline model, while providing interpretable parameters corresponding to physical phenomena. Complete code, model weights, and audio examples are openly available.
Paper Structure (26 sections, 12 equations, 2 figures, 1 table)

This paper contains 26 sections, 12 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Control features (RPM, torque) and their deltas are temporally embedded via MLP blocks and GRU. Outputs are decoded into synth parameters through MLP blocks, converted to parameter ranges by specialized heads (Pulse, Noise), upsampled to audio rate and scaled by conditioning signals. Audio is synthesized via differentiable modules with parameter updates calculated from multi-resolution spectral and harmonic losses.
  • Figure 2: Pulse shapes across varying parameter settings. Subplots show: (top-left) base pulses with varying decay factors $\lambda$; (top-right) pulses shaped by exponential envelopes $E(t)$ with varying $\alpha, \beta$; (bottom-left) pulses with thermodynamic phase modulation $\phi_{\text{mod}}$; and (bottom-right) final pulse shapes $P(t)$ combining all elements. Indices 1--3 denote corresponding parameter sets with respective curves distinguished by color gradients.