Significance of Chirp MFCC as a Feature in Speech and Audio Applications
S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan
TL;DR
This work introduces chirp MFCC, a spectral feature formed by applying MFCC to the chirp magnitude spectrum instead of the traditional Fourier magnitude spectrum. Grounded in Z-transform theory, it shows that estimating spectra with a radius r near the dominant pole radii improves phase and magnitude accuracy, especially for decaying components. Through analytical results on single- and multi-pole models and extensive real-speech analysis, the authors identify an optimal radius rc near a_max and demonstrate practical gains on speech-music classification, speaker identification, and speech command recognition using both GMM and DNN pipelines. The findings indicate Chirp MFCC offers consistent, meaningful improvements over vanilla MFCC, suggesting broad utility for refined spectral representation in audio and speech applications. The approach combines theoretical insight with empirical validation, highlighting an actionable path to enhance MFCC-based features in real-world systems.
Abstract
A novel feature, based on the chirp z-transform, that offers an improved representation of the underlying true spectrum is proposed. This feature, the chirp MFCC, is derived by computing the Mel frequency cepstral coefficients from the chirp magnitude spectrum, instead of the Fourier transform magnitude spectrum. The theoretical foundations for the proposal, and the experimental validation using product of likelihood Gaussians, to show the improved class separation offered by the proposed chirp MFCC, when compared with vanilla MFCC are discussed. Further, real world evaluation of the feature is performed using three diverse tasks, namely, speech-music classification, speaker identification, and speech commands recognition. It is shown in all three tasks that the proposed chirp MFCC offers considerable improvements.
