Table of Contents
Fetching ...

Analysis-Driven Procedural Generation of an Engine Sound Dataset with Embedded Control Annotations

Robin Doerfler, Lonce Wyse

TL;DR

This work presents an analysis-driven framework for generating engine audio with sample-accurate control annotations, and generates the Procedural Engine Sounds Dataset, a set of engine audio signals with sample-accurate RPM and torque annotations, spanning a wide range of operating conditions, signal complexities, and harmonic profiles.

Abstract

Computational engine sound modeling is central to the automotive audio industry, particularly for active sound design, virtual prototyping, and emerging data-driven engine sound synthesis methods. These applications require large volumes of standardized, clean audio recordings with precisely time-aligned operating-state annotations: data that is difficult to obtain due to high costs, specialized measurement equipment requirements, and inevitable noise contamination. We present an analysis-driven framework for generating engine audio with sample-accurate control annotations. The method extracts harmonic structures from real recordings through pitch-adaptive spectral analysis, which then drive an extended parametric harmonic-plus-noise synthesizer. With this framework, we generate the Procedural Engine Sounds Dataset (19 hours, 5,935 files), a set of engine audio signals with sample-accurate RPM and torque annotations, spanning a wide range of operating conditions, signal complexities, and harmonic profiles. Comparison against real recordings validates that the synthesized data preserves characteristic harmonic structures, and baseline experiments confirm its suitability for learning-based parameter estimation and synthesis tasks. The dataset is released publicly to support research on engine timbre analysis, control parameter estimation, acoustic modeling and neural generative networks.

Analysis-Driven Procedural Generation of an Engine Sound Dataset with Embedded Control Annotations

TL;DR

This work presents an analysis-driven framework for generating engine audio with sample-accurate control annotations, and generates the Procedural Engine Sounds Dataset, a set of engine audio signals with sample-accurate RPM and torque annotations, spanning a wide range of operating conditions, signal complexities, and harmonic profiles.

Abstract

Computational engine sound modeling is central to the automotive audio industry, particularly for active sound design, virtual prototyping, and emerging data-driven engine sound synthesis methods. These applications require large volumes of standardized, clean audio recordings with precisely time-aligned operating-state annotations: data that is difficult to obtain due to high costs, specialized measurement equipment requirements, and inevitable noise contamination. We present an analysis-driven framework for generating engine audio with sample-accurate control annotations. The method extracts harmonic structures from real recordings through pitch-adaptive spectral analysis, which then drive an extended parametric harmonic-plus-noise synthesizer. With this framework, we generate the Procedural Engine Sounds Dataset (19 hours, 5,935 files), a set of engine audio signals with sample-accurate RPM and torque annotations, spanning a wide range of operating conditions, signal complexities, and harmonic profiles. Comparison against real recordings validates that the synthesized data preserves characteristic harmonic structures, and baseline experiments confirm its suitability for learning-based parameter estimation and synthesis tasks. The dataset is released publicly to support research on engine timbre analysis, control parameter estimation, acoustic modeling and neural generative networks.
Paper Structure (18 sections, 14 equations, 2 figures)

This paper contains 18 sections, 14 equations, 2 figures.

Figures (2)

  • Figure 1: Order magnitude distributions comparing source recordings (left, 90 min) to framework-generated synthetic signals (right, 150 min from 5 min extraction material). Preserved structural features demonstrate acoustic authenticity under 30$\times$ augmentation. Higher-order variations reflect parametric modifications for timbral diversity.
  • Figure 2: Training and validation loss for neural synthesis model on datasets A, B, and C. Stable convergence demonstrates suitability for data-driven methods. Varying early-stopping points (Esc) reflect parametric modification capabilities.