Table of Contents
Fetching ...

On the Parameter Estimation of Sinusoidal Models for Speech and Audio Signals

George P. Kafentzis

TL;DR

EaQHM outperforms EDS in medium-to-large window size analysis, whereas EDSM yields higher reconstruction values for smaller analysis window sizes, suggesting a future research direction appears to be the merge of adaptivity of the eaQHM and parameter estimation robustness of the EDSM in a new paradigm for high-quality analysis and resynthesis of general audio signals.

Abstract

In this paper, we examine the parameter estimation performance of three well-known sinusoidal models for speech and audio. The first one is the standard Sinusoidal Model (SM), which is based on the Fast Fourier Transform (FFT). The second is the Exponentially Damped Sinusoidal Model (EDSM) which has been proposed in the last decade, and utilizes a subspace method for parameter estimation, and finally the extended adaptive Quasi-Harmonic Model (eaQHM), which has been recently proposed for AM-FM decomposition, and estimates the signal parameters using Least Squares on a set of basis function that are adaptive to the local characteristics of the signal. The parameter estimation of each model is briefly described and its performance is compared to the others in terms of signal reconstruction accuracy versus window size on a variety of synthetic signals and versus the number of sinusoids on real signals. The latter include highly non stationary signals, such as singing voices and guitar solos. The advantages and disadvantages of each model are presented via synthetic signals and then the application on real signals is discussed. Conclusively, eaQHM outperforms EDS in medium-to-large window size analysis, whereas EDSM yields higher reconstruction values for smaller analysis window sizes. Thus, a future research direction appears to be the merge of adaptivity of the eaQHM and parameter estimation robustness of the EDSM in a new paradigm for high-quality analysis and resynthesis of general audio signals.

On the Parameter Estimation of Sinusoidal Models for Speech and Audio Signals

TL;DR

EaQHM outperforms EDS in medium-to-large window size analysis, whereas EDSM yields higher reconstruction values for smaller analysis window sizes, suggesting a future research direction appears to be the merge of adaptivity of the eaQHM and parameter estimation robustness of the EDSM in a new paradigm for high-quality analysis and resynthesis of general audio signals.

Abstract

In this paper, we examine the parameter estimation performance of three well-known sinusoidal models for speech and audio. The first one is the standard Sinusoidal Model (SM), which is based on the Fast Fourier Transform (FFT). The second is the Exponentially Damped Sinusoidal Model (EDSM) which has been proposed in the last decade, and utilizes a subspace method for parameter estimation, and finally the extended adaptive Quasi-Harmonic Model (eaQHM), which has been recently proposed for AM-FM decomposition, and estimates the signal parameters using Least Squares on a set of basis function that are adaptive to the local characteristics of the signal. The parameter estimation of each model is briefly described and its performance is compared to the others in terms of signal reconstruction accuracy versus window size on a variety of synthetic signals and versus the number of sinusoids on real signals. The latter include highly non stationary signals, such as singing voices and guitar solos. The advantages and disadvantages of each model are presented via synthetic signals and then the application on real signals is discussed. Conclusively, eaQHM outperforms EDS in medium-to-large window size analysis, whereas EDSM yields higher reconstruction values for smaller analysis window sizes. Thus, a future research direction appears to be the merge of adaptivity of the eaQHM and parameter estimation robustness of the EDSM in a new paradigm for high-quality analysis and resynthesis of general audio signals.
Paper Structure (13 sections, 26 equations, 5 figures, 1 table)

This paper contains 13 sections, 26 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Inside the analysis window, the frequency (a) and amplitude (b) trajectories of a partial (solid grey line) is assumed to be constant for a stationary sinusoidal model (dotted line), while eaQHM (dashed line) iteratively adapts to the shape of the instantaneous component.
  • Figure 2: Stationary sinusoid plus exponentially amplitude-modulated chirp signal. Upper panel: time domain. Lower panel: frequency domain.
  • Figure 3: Signal-to-Reconstruction-Error Ratio over analysis window size for single-component signal. Window size is a multiple of the minimum period of the signal, namely $T_{min} = 10$ ms.
  • Figure 4: Multicomponent AM-FM signal with sinusoidal frequency modulation. Upper panel: Signal in time domain, Lower panel: instantaneous frequency.
  • Figure 5: Signal-to-Reconstruction-Error Ratio over analysis window size for multi-component signal. Window size is a multiple of the minimum period of the signal, namely $T_{min} = 6.68$ ms.