Significance of Chirp MFCC as a Feature in Speech and Audio Applications

S. Johanan Joysingh; P. Vijayalakshmi; T. Nagarajan

Significance of Chirp MFCC as a Feature in Speech and Audio Applications

S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan

TL;DR

This work introduces chirp MFCC, a spectral feature formed by applying MFCC to the chirp magnitude spectrum instead of the traditional Fourier magnitude spectrum. Grounded in Z-transform theory, it shows that estimating spectra with a radius r near the dominant pole radii improves phase and magnitude accuracy, especially for decaying components. Through analytical results on single- and multi-pole models and extensive real-speech analysis, the authors identify an optimal radius rc near a_max and demonstrate practical gains on speech-music classification, speaker identification, and speech command recognition using both GMM and DNN pipelines. The findings indicate Chirp MFCC offers consistent, meaningful improvements over vanilla MFCC, suggesting broad utility for refined spectral representation in audio and speech applications. The approach combines theoretical insight with empirical validation, highlighting an actionable path to enhance MFCC-based features in real-world systems.

Abstract

A novel feature, based on the chirp z-transform, that offers an improved representation of the underlying true spectrum is proposed. This feature, the chirp MFCC, is derived by computing the Mel frequency cepstral coefficients from the chirp magnitude spectrum, instead of the Fourier transform magnitude spectrum. The theoretical foundations for the proposal, and the experimental validation using product of likelihood Gaussians, to show the improved class separation offered by the proposed chirp MFCC, when compared with vanilla MFCC are discussed. Further, real world evaluation of the feature is performed using three diverse tasks, namely, speech-music classification, speaker identification, and speech commands recognition. It is shown in all three tasks that the proposed chirp MFCC offers considerable improvements.

Significance of Chirp MFCC as a Feature in Speech and Audio Applications

TL;DR

Abstract

Paper Structure (32 sections, 11 equations, 4 figures, 4 tables)

This paper contains 32 sections, 11 equations, 4 figures, 4 tables.

Introduction
Analysis
Analysis of Single-Pole System
Analysis of Multi-Pole Systems
Experimental Setup
Observations
Conclusions
Analysis of Real Speech
Experimental Setup
Observations
Conclusions
Experimental Comparison of MFCC and Chirp MFCC using POG
Chirp MFCC Feature
Experimental Setup
Observations
...and 17 more sections

Figures (4)

Figure 1: The six cases considered for empirical analysis of the error in phase estimation in multi-pole systems. As the radii of the poles are varied for each scenario in a particular case, they move along the dotted line.
Figure 2: Eight complex conjugate poles of a synthesized signal. The solid line marks the unit circle, while the dotted line marks the analysis circle at radius $r_{c}=a_{max}$.
Figure 3: Histogram of the radius of the pole with the maximum radius, computed across 400 (1s long) utterances of the Google speech commands dataset.
Figure 4: Product of Gaussians showing the difference in percentage overlap offered by MFCC and chirp MFCC. The Gaussians correspond to the likelihoods of phone model M1 (/aa/) tested with examples of the same phone P1 (/aa/), and M1 tested with examples of a different phone P2 (/ih/).

Significance of Chirp MFCC as a Feature in Speech and Audio Applications

TL;DR

Abstract

Significance of Chirp MFCC as a Feature in Speech and Audio Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (4)