Table of Contents
Fetching ...

MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network

Karthik Sivarama Krishnan, Koushik Sivarama Krishnan

TL;DR

Audio deepfakes threaten information integrity, and MFAAN addresses this by integrating three feature representations (MFCC, LFCC, Chroma-STFT) through parallel CNN paths and a fusion-based decision module. This multi-path, multi-feature approach leverages complementary cues from timbre, linear spectral details, and harmonic content to robustly distinguish real and manipulated audio. On real-world benchmarks, MFAAN achieves high accuracies (e.g., 99.21% on In-the-Wild and 94.47% on FoR) with low EERs, outperforming a baseline CNN and competing with or surpassing prior methods. The work demonstrates the value of comprehensive feature fusion for audio forensics and points to scalable enhancements like additional feature paths and attention mechanisms for future progress.

Abstract

In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.

MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network

TL;DR

Audio deepfakes threaten information integrity, and MFAAN addresses this by integrating three feature representations (MFCC, LFCC, Chroma-STFT) through parallel CNN paths and a fusion-based decision module. This multi-path, multi-feature approach leverages complementary cues from timbre, linear spectral details, and harmonic content to robustly distinguish real and manipulated audio. On real-world benchmarks, MFAAN achieves high accuracies (e.g., 99.21% on In-the-Wild and 94.47% on FoR) with low EERs, outperforming a baseline CNN and competing with or surpassing prior methods. The work demonstrates the value of comprehensive feature fusion for audio forensics and points to scalable enhancements like additional feature paths and attention mechanisms for future progress.

Abstract

In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.
Paper Structure (35 sections, 1 figure, 1 table)